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NEISSERIAL ANTIGENS 

This invention relates to antigens from Neisseria bacteria. 
BACKGROUND ART 

Neisseria meningitidis and Neisseria gonorrhoeae are non-motile, gram negative diplococci that 
5 are pathogenic in humans. Nmeningitidis colonises the pharynx and causes meningitis (and, 
occasionally, septicaemia in the absence of meningitis); N. gonorrhoeae colonises the genital tract 
and causes gonorrhea. Although colonising different areas of the body and causing completely 
different diseases, the two pathogens are closely related, although one feature that clearly 
differentiates meningococcus from gonococcus is the presence of a polysaccharide capsule that is 
10 present in all pathogenic meningococci. 

N.gonorrhoeae caused approximately 800,000 cases per year during the period 1983-1990 in the 
United States alone (chapter by Meitzner & Cohen, "Vaccines Against Gonococcal Infection", In: 
New Generation Vaccines, 2nd edition, ed. Levine, Woodrow, Kaper, & Cobon, Marcel Dekker, 
New York, 1997, pp.8 17-842). The disease causes significant morbidity but limited mortality. 
1 5 Vaccination against N.gonorrhoeae would be highly desirable, but repeated attempts have failed. 
The main candidate antigens for this vaccine are surface-exposed proteins such as pili, porins, 
opacity-associated proteins (Opas) and other surface-exposed proteins such as the Lip, Laz, IgAl 
protease and transferrin-binding proteins. The hpooligosaccharide (LOS) has also been suggested 
as vaccine (Meitzner & Cohen, supra). 

20 Nmeningitidis causes both endemic and epidemic disease. In the United States the attack rate is 
0.6-1 per 100,000 persons per year, and it can be much greater during outbreaks (see Lieberman 
et al. (1996) Safety and Immunogenicity of a Serogroups A/C Neisseria meningitidis 
Ohgosaccharide-Protein Conjugate Vaccine in Young Children. JAMA 275(19):1499-1503; 
Schuchat et al (1997) Bacterial Meningitis in the United States in 1995. NEngl J Med 337(14):970- 

25 976). In developing countries, endemic disease rates are much higher and during epidemics 
incidence rates can reach 500 cases per 100,000 persons per year. Mortality is extremely high, at 
10-20% in the United States, and much higher in developing countries. Following the introduction 
of the conjugate vaccine against Haemophilus influenzae, N meningitidis is the major cause of 
bacterial meningitis at all ages in the United States (Schuchat et al (1997) supra). 
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Based on the organism's capsular polysaccharide, 12 serogroups of Kmeningitidis have been 
identified. Group A is the pathogen most often implicated in epidemic disease in sub-Saharan 
Africa. Serogroups B and C are responsible for the vast majority of cases in the United States and 
in most developed countries. Serogroups W135 and Y are responsible for the rest of the cases in 

5 the United States and developed countries. The meningococcal vaccine currently in use is a 
tetravalent polysaccharide vaccine composed of serogroups A, C, Y and W135. Although 
efficacious in adolescents and adults, it induces a poor immune response and short duration of 
protection, and cannot be used in infants [eg. Morbidity and Mortality weekly report, Vol.46, No. 
RR-5 (1997)]. This is because polysaccharides are T-cell independent antigens that induce a weak 

10 immune response that cannot be boosted by repeated immunization. Following the success of the 
vaccination against Kinfluenzae, conjugate vaccines against serogroups A and C have been 
developed and are at the final stage of clinical testing (Zollinger WD 4< New and Improved Vaccines 
Against Meningococcal Disease" in: New Generation Vaccines, supra, pp. 469-488; Lieberman et 
al (1996) supra; Costantino et al (1992) Development and phase I clinical testing of a conjugate 

15 vaccine against meningococcus A and C. Vaccine 10:691-698). 

Meningococcus B remains a problem, however. This serotype currently is responsible for 
approximately 50% of total meningitis in the United States, Europe, and South America. The 
polysaccharide approach cannot be used because the menB capsular polysaccharide is a polymer 
of a(2-8)-linked AT-acetyl neiu^minic acid that is also present in mammalian tissue. This results in 

20 tolerance to the antigen; indeed, if an immune response were elicited, it would be anti-self, and 
therefore undesirable. In order to avoid induction of autoimmunity and to induce a protective 
immune response, the capsular polysaccharide has, for instance, been chemically modified 
substituting the JV-acetyl groups with N-propionyl groups, leaving the specific antigenicity 
unaltered (Romero & Outschoom (1994) Current status of Meningococcal group B vaccine 

25 candidates: capsular or non-capsular? Clin Microbiol Rev 7(4):559-575). 

Alternative approaches to menB vaccines have used complex mixtures of outer membrane proteins 
(OMPs), containing either the OMPs alone, or OMPs enriched in porins, or deleted of the class 4 
OMPs that are believed to induce antibodies that block bactericidal activity. This approach 
produces vaccines that are not well characterized. They are able to protect against the homologous 
30 strain, but are not effective at large where there are many antigenic variants of the outer membrane 
proteins. To overcome the antigenic variability, multivalent vaccines containing up to nine different 
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porins have been constructed (eg. Poolman JT (1992) Development of a meningococcal vaccine. 
Infect. Agents Dis. 4:13-28). Additional proteins to be used in outer membrane vaccines have been 
the opa and opc proteins, but none of these approaches have been able to overcome the antigenic 
variability (eg. Ala'Aldeen & Borriello (1996) The meningococcal transferrm-bmding proteins 1 
5 and 2 are both surface exposed and generate bactericidal antibodies capable of killing homologous 
and heterologous strains. Vaccine 14(l):49-53). 

A certain amount of sequence data is available for meningococcal and gonoccocal genes and 
proteins (eg. EP-A-0467714, W096/29412), but this is by no means complete. The provision of 
further sequences could provide an opportunity to identify secreted or surface-exposed proteins that 
1 0 are presumed targets for the immune system and which are not antigenically variable. For instance, 
some of the identified proteins could be components of efficacious vaccines against meningococcus 
B, some could be components of vaccines against all meningococcal serotypes, and others could 
be components of vaccines against all pathogenic Neisseriae. 

THE INVENTION 

15 The invention provides proteins comprising the Neisserial amino acid sequences disclosed in the 
examples. These sequences relate to N.meningitidis or N. gonorrhoeae. 

It also provides proteins comprising sequences homologous (ie. having sequence identity) to the 
Neisserial amino acid sequences disclosed in the examples. Depending on the particular sequence, 
the degree of identity is preferably greater than 50% (eg. 65%, 80%, 90%, or more). These 
20 homologous proteins include mutants and allelic variants of the sequences disclosed in the 
examples. Typically, 50% identity or more between two proteins is considered to be an indication of 
functional equivalence. Identity between the proteins is preferably determined by the Smith-Waterman 
homology search algorithm as implemented in the MPSRCH program (Oxford Molecular), using an 
affine gap search with parameters gap open penalty=12 and gap extension penalty=l. 

25 The invention further provides proteins comprising fragments of the Neisserial amino acid 
sequences disclosed in the examples. The fragments should comprise at least n consecutive amino 
acids from the sequences and, depending on the particular sequence, n is 7 or more (eg. 8, 10, 12, 
14, 16, 18, 20 or more). Preferably the fragments comprise an epitope from the sequence. 
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The proteins of the invention can, of course, be prepared by various means (eg. recombinant 
expression, purification from cell culture, chemical synthesis etc.) and in various forms (eg. native, 
fusions etc.). They are preferably prepared in substantially pure or isolated form (ie. substantially 
free from other Neisserial or host cell proteins) 

5 According to a further aspect, the invention provides antibodies which bind to these proteins. These 
may be polyclonal or monoclonal and may be produced by any suitable means. 

According to a further aspect, the invention provides nucleic acid comprising the Neisserial 
nucleotide sequences disclosed in the examples. In addition, the invention provides nucleic acid 
comprising sequences homologous (ie. having sequence identity) to the Neisserial nucleotide 
1 0 sequences disclosed in the examples. 

Furthermore, the invention provides nucleic acid which can hybridise to the Neisserial nucleic acid 
disclosed in the examples, preferably under "high stringency" conditions (eg. 65°C in a O.lxSSC, 
0.5% SDS solution). - 1 

Nucleic acid comprising fragments of these sequences are also provided. These should comprise 
15 at least n consecutive nucleotides from the Neisserial sequences and, depending on the particular 
sequence, n is 10 or more (eg 12, 14, 15, 18, 20, 25, 30, 35, 40 or more). 

According to a further aspect, the invention provides nucleic acid encoding the proteins and protein 
fragments of the invention. 

It should also be appreciated that the invention provides nucleic acid comprising sequences 
20 complementary to those described above (eg. for antisense or probing purposes). 

Nucleic acid according to the invention can, of course, be prepared in many ways (eg. by chemical 
synthesis, from genomic or cDNA libraries, from the organism itself etc.) and can take various 
forms (eg. single stranded, double stranded, vectors, probes etc.). 

In addition, the term "nucleic acid" includes DNA and RNA, and also their analogues, such as 
25 those containing modified backbones, and also peptide nucleic acids (PNA) etc. 
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According to a further aspect, the invention provides vectors comprising nucleotide sequences of 
the invention (eg. expression vectors) and host cells transformed with such vectors. 

According to a further aspect, the invention provides compositions comprising protein, antibody, 
and/or nucleic acid according to the invention. These compositions may be suitable as vaccines, 
5 for instance, or as diagnostic reagents, or as immunogenic compositions. 

The invention also provides nucleic acid, protein, or antibody according to the invention for use 
as medicaments (eg, as vaccines) or as diagnostic reagents. It also provides the use of nucleic acid, 
protein, or antibody according to the invention in the manufacture of: (i) a medicament for treating 
or preventing infection due to Neisserial bacteria; (ii) a diagnostic reagent for detecting the 
10 presence of Neisserial bacteria or of antibodies raised against Neisserial bacteria; and/or (iii) a 
reagent which can raise antibodies against Neisserial bacteria. Said Neisserial bacteria may be any 
species or strain (such as ^gonorrhoeae, or any strain oiKmeningitidis, such as strain A, strain 
B or strain C). ■ 

The invention also provides a method of treating a patient, comprising administering to the patieni 
15 a therapeutically effective amount of nucleic acid, protein, and/or antibody according to the 
invention. 

According to further aspects, the invention provides various processes. 

A process for producing proteins of the invention is provided, comprising the step of culturing a 
host cell according to the invention under conditions which induce protein expression. 

20 A process for producing protein or nucleic acid of the invention is provided, wherein the the protein 
or nucleic acid is synthesised in part or in whole using chemical means. 

A process for detecting polynucleotides of the invention is provided, comprising the steps of: (a) 
contacting a nucleic probe according to the invention with a biological sample under hybridizing 
conditions to form duplexes; and (b) detecting said duplexes. 

25 A process for detecting proteins of the invention is provided, comprising the steps of: (a) contacting 
an antibody according to the invention with a biological sample under conditions suitable for the 
formation of an antibody-antigen complexes; and (b) detecting said complexes. 
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A summary of standard techniques and procedures which may be employed in order to perform the 
invention (eg. to utilise the disclosed sequences for vaccination or diagnostic purposes) follows. 
This summary is not a limitation on the invention but, rather, gives examples that may be used, but 
are not required. 

5 General 

The practice of the present invention will employ, unless otherwise indicated, conventional 
techniques of molecular biology, microbiology, recombinant DNA, and immunology, which are 
within the skill of the art. Such techniques are explained fully in the literature eg. Sambrook 
Molecular Cloning; A Laboratory Manual, Second Edition (1989); DNA Cloning, Volumes I and 

10 ii (D.N Glover ed. 1985); Oligonucleotide Synthesis (MJ. Gait ed, 1984); Nucleic Acid 
Hybridization (B.D. Hames & S J. Higgins eds. 1984); Transcription and Translation (B.D. Haines 
& SJ. Higgins eds. 1984); Animal Cell Culture (R.L Freshney ed. 1986); Immobilized Cells and 
Enzymes (IRL Press, 1986); B. Perbal, A Practical Guide to Molecular Cloning (1984); the 
Methods in Enzymology series (Academic Press, Inc.), especially volumes 154 & 155; Gene 

15 Transfer Vectors for Mammalian Cells (J.H. Miller and M.P. Calos eds. 1987, Cold Spring Harbor 
Laboratory); Mayer and Walker, eds. (1987), Immunochemical Methods in Cell and Molecular 
Biology (Academic Press, London); Scopes, (1987) Protein Purification: Principles and Practice, 
Second Edition (Springer- Verlag, N.Y.), and Handbook of Experimental Immunology, Volumes 
I-W (D.M. Weir and C. C Blackwell eds 1986). 

20 Standard abbreviations for nucleotides and amino acids are used in this specification. 

All publications, patents, and patent applications cited herein are incorporated in full by reference. 
In particular, the contents of UK patent applications 9723516.2, 9724190.5, 9724386.9, 9725158.1, 
9726147.3, 9800759.4, and 9819016.8 are incorporated herein. 

Definitions 

25 A composition containing X is "substantially free of Y when at least 85% by weight of the total 
X+Y in the composition is X. Preferably, X comprises at least about 90% by weight of the total of 
X+Y in the composition, more preferably at least about 95% or even 99% by weight. 

The term "comprising" means "including" as well as "consisting" eg. a composition "comprising" 
X may consist exclusively of X or may include something additional to X, such as X+Y. 
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The tenn '^heterologous" refers to two biological components that are not found together in nature. 
The components may be host cells, genes, or regulatory regions, such as promoters. Although the 
heterologous components are not found together in nature, they can function together, as when a 
promoter heterologous to a gene is operably linked to the gene. Another example is where a 
Neisserial sequence is heterologous to a mouse host cell. A further examples would be two epitopes 
from the same or different proteins which have been assembled in a single protein in an 
arrangement not found in nature. 

An "origin of replication" is a polynucleotide sequence that initiates and regulates replication of 
polynucleotides, such as an expression vector. The origin of replication behaves as an autonomous 
unit of polynucleotide replication within a cell, capable of replication under its own control. An 

origin of replication may be needed for a vector to replicate in a particular host cell. With certain 

i 

origins of replication, an expression vector can be reproduced at a high copy number in the 
presence of the appropriate proteins within the cell. Examples of origins are the autonomously 
replicating sequences, which are effective in yeast; and the viral T-antigen, effective in COS|7 
cells. 

A "mutant" sequence is defined as DNA, RNA or amino acid sequence differing from but having 
sequence identity with the native or disclosed sequence. Depending on the particular sequence, the 
degree of sequence identity between the native or disclosed sequence and the mutant sequence is 
preferably greater than 50% (eg. 60%, 70%, 80%, 90%, 95%, 99% or more, calculated using the 
Smith-Waterman algorithm as described above). As used herein, an "allelic variant" of a nucleic 
acid molecule, or region, for which nucleic acid sequence is provided herein is a nucleic acid 
molecule, or region, that occurs essentially at the same locus in the genome of another or second 
isolate, and that, due to natural variation caused by, for example, mutation or recombination, has 
a similar but not identical nucleic acid sequence. A coding region allelic variant typically encodes 
a protein having similar activity to that of the protein encoded by the gene to which it is being 
compared. An allelic variant can also comprise an alteration in the 5* or 3' untranslated regions of 
the gene, such as in regulatory control regions (eg. see US patent 5,753,235). 

Expression systems 

The Neisserial nucleotide sequences can be expressed in a variety of different expression systems; 
for example those used with mammalian cells, baculoviruses, plants, bacteria, and yeast. 



WO 99/24578 PCT/IB98/01665 

-8- 

i. Mammalian Systems 

Mammalian expression systems are known in the art. A mammalian promoter is any DNA 
sequence capable of binding mammalian RNA polymerase and initiating the downstream (3') 
transcription of a coding sequence (eg. structural gene) into mRNA. A promoter will have a 
transcription initiating region, which is usually placed proximal to the 5 f end of the coding 
sequence, and a TATA box, usually located 25-30 base pairs (bp) upstream of the transcription 
initiation site. The TATA box is thought to direct RNA polymerase II to begin RNA synthesis at 
the correct site. A mammalian promoter will also contain an upstream promoter element, usually 
located within 100 to 200 bp upstream of the TATA box. An upstream promoter element 
determines the rate at which transcription is initiated and can act in either orientation [Sambrook 
et al. (1989) "Expression of Cloned Genes in Mammalian Cells." In Molecular Cloning: A 
Laboratory Manual, 2nd edj. 

Mammalian viral genes are often highly expressed and have a broad host range; therefore sequences 
encoding mammalian viral genes provide particularly useful promoter sequences. Examples include 
the SV40 early promoter, mouse mammary tumor virus LTR promoter, adenovirus major late 
promoter (Ad MLP), and herpes simplex virus promoter. In addition, sequences derived from non- 
viral genes, such as the murine metallotheionein gene, also provide useful promoter sequences. 
Expression may be either constitutive or regulated (inducible), depending on the promoter can be 
induced with glucocorticoid in hormone-responsive cells. 

The presence of an enhancer element (enhancer), combined with the promoter elements described 
above, will usually increase expression levels. An enhancer is a regulatory DNA sequence that can 
stimulate transcription up to 1000-fold when linked to homologous or heterologous promoters, with 
synthesis beginning at the normal RNA start site. Enhancers are also active when they are placed 
upstream or downstream from the transcription initiation site, in either normal or flipped orien- 
tation, or at a distance of more than 1000 nucleotides from the promoter [Maniatis et al. (1987) 
Science 236:1231 r ; Alberts et al. (1989) Molecular Biology of the Cell, 2nd ed.]. Enhancer elements 
derived from viruses may be particularly useful, because they usually have a broader host range. 
Examples include the SV40 early gene enhancer [Dijkema et al (1985) EMBO J. 4:761] and the 
enhancer/promoters derived from the long terminal repeat (LTR) of the Rous Sarcoma Virus 
[Gorman et al. (1982b) Proa Natl. Acad. Sci. 79:6777] and from human cytomegalovirus [Boshart 
et al. (1985) Cell 41:521]. Additionally, some enhancers are regulatable and become active only 
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in the presence of an inducer, such as a hormone or metal ion [Sassone-Corsi and Borelli (1986) 
Trends Genet 2:215; Maniatis et al. (1987) Science 236:1237]. 

A DNA molecule may be expressed intracellularly in mammalian cells. A promoter sequence may be 
directiy linked with the DNA molecule, in which case the first amino acid at the N-tenninus of the 
5 recombinant protein will always be a methionine, which is encoded by the ATG start codon. If desired, 
the N-terminus may be cleaved from the protein by in vitro incubation with cyanogen bromide. 

Alternatively, foreign proteins can also be secreted from the cell into the growth media by creating 
chimeric DNA molecules that encode a fusion protein comprised of a leader sequence fragment that 
provides for secretion of the foreign protein in mammalian cells. Preferably, there are processing 
10 sites encoded between the leader fragment and the foreign gene that can be cleaved either in vivo 
or in vitro. The leader sequence fragment usually encodes a signal peptide comprised of 
hydrophobic amino acids which direct the secretion of the protein from the cell. The adenovirus 
triparite leader is an example of a leader sequence that provides for secretion of a foreign protein 
in mammalian cells. 

1 5 Usually, transcription termination and polyadenylation sequences recognized by mammalian cells 
are regulatory regions located 3* to the translation stop codon and thus, together with the promoter 
elements, flank the coding sequence. The 3' terminus of the mature mRNA is formed by site- 
specific post-transcriptional cleavage and polyadenylation [Birnstiel et al. (1985) Cell 47:349; 
Proudfoot and Whitelaw (1988) "Termination and 3' end processing of eukaryotic RNA. In 

20 Transcription and splicing (ed. B.D. Hames and D.M. Glover); Proudfoot (1989) Trends Biochem: 
Sci. 14:105]. These sequences direct the transcription of an mRNA which can be translated into the 
polypeptide encoded by the DNA. Examples of transcription terminater/polyadenylation signals 
include those derived from SV40 [Sambrook et al (1989) "Expression of cloned genes in cultured 
mammalian cells." In Molecular Cloning: A Laboratory Manual]. 

25 Usually, the above described components, comprising a promoter, polyadenylation signal, and 
transcription termination sequence are put together into expression constructs. Enhancers, introns 
with functional splice donor and acceptor sites, and leader sequences may also be included in an 
expression construct, if desired. Expression constructs are often maintained in a replicon, such as 
an extrachromosomal element (eg. plasmids) capable of stable maintenance in a host, such as 

30 mammalian cells or bacteria. Mammalian replication systems include those derived from animal 
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viruses, which require trans-acting factors to replicate. For example, plasmids containing the 
replication systems of papovaviruses, such as SV40 [Gluzman (1981) Cell 25:175] or 
polyomavirus, replicate to extremely high copy number in the presence of the appropriate viral T 
antigen. Additional examples of mammalian replicons include those derived from bovine 
5 papillomavirus and Epstein-Barr virus. Additionally, the replicon may have two replicaton systems, 
thus allowing it to be maintained, for example, in mammalian cells for expression and in a 
prokaryotic host for cloning and amplification. Examples of such mammalian-bacteria shuttle 
vectors include pMT2 [Kaufinan et al. (1989) Mol Cell Biol 9:946] and pHEBO [Shimizu et al. 
(1986) Mol Cell Biol 5:1074]. 

10 The transformation procedure used depends upon the host to be transformed. Methods for 
introduction of heterologous polynucleotides into mammalian cells are known in the art and include 
dextran-mediated transfection, calcium phosphate precipitation, polybrene mediated transfection, 
protoplast fusion, electroporation, encapsulation of the polynucleotide(s) in liposomes, and direct 
microinjection of the DNA into nuclei. I 

i 

15 Mammalian cell lines available as hosts for expression are known in the art and include many 
immortalized cell lines available from the American Type Culture Collection (ATCC), including 
but not limited to, Chinese hamster ovary (CHO) cells, HeLa cells, baby hamster kidney (BHK) 
cells, monkey kidney cells (COS), human hepatocellular carcinoma cells {eg. Hep G2), and a 
number of other cell lines. 

20 ii. Baculovirus Systems 

The polynucleotide encoding the protein can also be inserted into a suitable insect expression vector, 
and is operably linked to the control elements within that vector. Vector construction employs 
techniques which are known in the art. Generally, the components of the expression system include 
a transfer vector, usually a bacterial plasmid, which contains both a fragment of the baculovirus 

25 genome, and a convenient restriction site for insertion of the heterologous gene or genes to be 
expressed; a wild type baculovirus with a sequence homologous to the baculovirus-specific fragment 
in the transfer vector (this allows for the homologous recombination of the heterologous gene in to 
the baculovirus genome); and appropriate insect host cells and growth media. 

After inserting the DNA sequence encoding the protein into the transfer vector, the vector and the 
30 wild type viral genome are transfected into an insect host cell where the vector and viral genome 
are allowed to recombine. The packaged recombinant virus is expressed and recombinant plaques 
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are identified and purified. Materials and methods for baculovirus/insect cell expression systems 
are commercially available in kit form from, inter alia, Invitrogen, San Diego CA ("MaxBac" kit). 
These techniques are generally known to those skilled in the art and fully described in Summers 
and Smith, Texas Agricultural Experiment Station Bulletin No. 1555 (1987) (hereinafter "Summers 
and Smith"). 

Prior to inserting the DNA sequence encoding the protein into the baculovirus genome, the above 
described components, comprising a promoter, leader (if desired), coding sequence of interest, and 
transcription termination sequence, are usually assembled into an intermediate transplacement 
construct (transfer vector). This construct may contain a single gene and operably linked regulatory 
elements; multiple genes, each with its owned set of operably linked regulatory elements; or multiple 
genes, regulated by the same set of regulatory elements. Intermediate transplacement constructs are 
often maintained in a replicon, such as an extrachromosomal element (eg. plasmids) capable of stable 
maintenance in a host, such as a bacterium. The replicon will have a replication system, thus allowing 
it to be maintained in a suitable host for cloning and amplification. j 

Currently, the most commonly used transfer vector for introducing foreign genes into AcNPV is 
pAc373. Many other vectors, known to those of skill in the art, have also been designed. These 
include, for example, pVL985 (which alters the polyhedrin start codon from ATG to ATT, and 
which introduces a BamHI cloning site 32 basepairs downstream from the ATT; see Luckow and 
Summers, Virology (1989) 1 7:3 1 . 

The plasmid usually also contains the polyhedrin polyadenylation signal (Miller et al. (1988) Ann. 
Rev. Microbiol, 42:111) and a prokaryotic ampicillin-resistance {amp) gene and origin of 
replication for selection and propagation in E. colu 

Baculovirus transfer vectors usually contain a baculovirus promoter. A baculovirus promoter is any 
DNA sequence capable of binding a baculovirus RNA polymerase and initiating the downstream 
(5' to V) transcription of a coding sequence (eg. structural gene) into mRNA. A promoter will have 
a transcription initiation region which is usually placed proximal to the 5 ' end of the coding 
sequence. This transcription initiation region usually includes an RNA polymerase binding site and 
a transcription initiation site. A baculovirus transfer vector may also have a second domain called 
an enhancer, which, if present, is usually distal to the structural gene. Expression may be either 
regulated or constitutive. 
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Structural genes, abundantly transcribed at late times in a viral infection cycle, provide particularly 
useful promoter sequences. Examples include sequences derived from the gene encoding the viral 
polyhedron protein, Friesen et aL, (1986) "The Regulation of Baculovirus Gene Expression, 11 in: 
The Molecular Biology ofBaculoviruses (ed. Walter Doerfler); EPO PubL Nos. 127 839 and 155 
5 476; and the gene encoding the plO protein, Vlak et aL, (1988), J. Gen. Virol 69:765. 

DNA encoding suitable signal sequences can be derived from genes for secreted insect or 
baculovirus proteins, such as the baculovirus polyhedrin gene (Carbonell et aL (1988) Gene, 
75:409). Alternatively, since the signals for mammalian cell posttranslational modifications (such 
as signal peptide cleavage, proteolytic cleavage, and phosphorylation) appear to be recognized by 

10 insect cells, and the signals required for secretion and nuclear accumulation also appear to be 
conserved between the invertebrate cells and vertebrate cells, leaders of non-insect origin, such as 
those derived from genes encoding human ot-interferon, Maeda et aL, (1985), Nature 315:592; 
human gastrin-releasing peptide, Lebacq-Verheyden et aL, (1988), Molec. Cell Biol 5:3129; 
human IL-2, Smith et aL, (1985) Proc. Natl Acad. Sci. USA, 52:8404; mouse IL-3, (Miyajima et 

15 aL, (1987) Gene 55:273; and human glucocerebrosidase, Martin et aL (1988) DNA, 7:99, can also 
be used to provide for secretion in insects. 

A recombinant polypeptide or polyprotein may be expressed intracellularly or, if it is expressed 
with the proper regulatory sequences, it can be secreted. Good intracellular expression of nonfused 
foreign proteins usually requires heterologous genes that ideally have a short leader sequence 
20 containing suitable translation initiation signals preceding an ATG start signal. If desired, 
methionine at the N-terminus may be cleaved from the mature protein by in vitro incubation with 
cyanogen bromide. 

Alternatively, recombinant polyproteins or proteins which are not naturally secreted can be secreted 
from the insect cell by creating chimeric DNA molecules that encode a fusion protein comprised 
25 of a leader sequence fragment that provides for secretion of the foreign protein in insects. The 
leader sequence fragment usually encodes a signal peptide comprised of hydrophobic amino acids 
which direct the translocation of the protein into the endoplasmic reticulum. 

After insertion of the DNA sequence and/or the gene encoding the expression product precursor 
of the protein, an insect cell host is co-transformed with the heterologous DNA of the transfer 
30 vector and the genomic DNA of wild type baculovirus ~ usually by co-transfection. The promoter 
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and transcription termination sequence of the construct will usually comprise a 2-5kb section of the 
baculovirus genome. Methods for introducing heterologous DNA into the desired site in the 
baculovirus virus are known in the art. (See Summers and Smith supra; Ju et al. (1987); Smith et 
al., Mol Cell Biol (1983) 5:2156; and Luckow and Summers (1989)). For example, the insertion 
5 can be into a gene such as the polyhedrin gene, by homologous double crossover recombination; 
insertion can also be into a restriction enzyme site engineered into the desired baculovirus gene. 
Miller et al., (1989), Bioessays 4:91.The DNA sequence, when cloned in place of the polyhedrin 
gene in the expression vector, is flanked both 5' and 3' by polyhedrin-specific sequences and is 
positioned downstream of the polyhedrin promoter. 

10 The newly formed baculovirus expression vector is subsequently packaged into an infectious 
recombinant baculovirus. Homologous recombination occurs at low frequency (between about 1% 



and about 5%); thus, the majority of the virus produced after cotransfection is still wild-type virus. 
Therefore, a method is necessary to identify recombinant viruses. An advantage of the expression 
system is a visual screen allowing recombinant viruses to be distinguished. The polyhedrin protein, 



1 5 which is produced by the native virus, is produced at very high levels in the nuclei of infected cells 
at late times after viral infection. Accumulated polyhedrin protein forms occlusion bodies that also 



giving them a bright shiny appearance that is readily visualized under the light microscope. Cells 
infected with recombinant viruses lack occlusion bodies. To distinguish recombinant virus from 
20 wild-type virus, the transfection supernatant is plaqued onto a monolayer of insect cells by 
techniques known to those skilled in the art. Namely, the plaques are screened under the light 
microscope for the presence (indicative of wild-type virus) or absence (indicative of recombinant 
virus) of occlusion bodies. "Current Protocols in Microbiology" Vol. 2 (Ausubel et al. eds) at 16.8 
(Supp. 10, 1990); Summers and Smith, supra; Miller et al. (1989). 

25 Recombinant baculovirus expression vectors have been developed for infection into several insect 
cells. For example, recombinant baculoviruses have been developed for, inter alia: Aedes aegypti 
, Autographa californica, Bombyx mori> Drosophila melanogaster, Spodoptera frugiperda, and 
Trichoplusia ni (WO 89/046699; Carbonell et al., (1985) J. Virol 56:153; Wright (1986) Nature 
32/:718; Smith et al., (1983) Mol Cell Biol 5:2156; and see generally, Fraser, et al (1989) In 

30 Vitro Cell Dev. Biol 25:225). 




contain embedded particles. These occlusion bodies, up to 15 |im in size, are highly refractile, 
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Cells and cell culture media are commercially available for both direct and fusion expression of 
heterologous polypeptides in a baculovirus/expression system; cell culture technology is generally 
known to those skilled in the art. See, eg. Summers and Smith supra. 

The modified insect cells may then be grown in an appropriate nutrient medium, which allows for 
5 stable maintenance of the plasmid(s) present in the modified insect host. Where the expression product 
gene is under inducible control, the host may be grown to high density, and expression induced. 
Alternatively, where expression is constitutive, the product will be continuously expressed into the 
medium and the nutrient medium must be continuously circulated, while removing the product of 
interest and augmenting depleted nutrients. The product may be purified by such techniques as 
10 chromatography, eg. HPLC, affinity chromatography, ion exchange chromatography, etc.; 
electrophoresis; density gradient centrifiigation; solvent extraction, or the like. As appropriate, the 
product may be further purified, as required, so as to remove substantially any insect proteins which 
are also secreted in the medium or result from lysis of insect cells, so as to provide a product which 
is at least substantially free of host debris, eg. proteins, lipids and polysaccharides. 

15 In order to obtain protein expression, recombinant host cells derived from the transfoimants are 
incubated under conditions which allow expression of the recombinant protein encoding sequence. 
These conditions will vary, dependent upon the host cell selected. However, the conditions are 
readily ascertainable to those of ordinary skill in the art, based upon what is known in the art. 
iii. Plant Systems 

20 There are many plant cell culture and whole plant genetic expression systems known in the art. 
Exemplary plant cellular genetic expression systems include those described in patents, such as: 
US 5,693,506; US 5,659,122; and US 5,608,143. Additional examples of genetic expression in 
plant cell culture has been described by Zenk, Phytochemistry 30:3861-3863 (1991). Descriptions 
of plant protein signal peptides may be found in addition to the references described above in 

25 Vaulcombe et al., Mol Gen. Genet. 209:33-40 (1987); Chandler et al., Plant Molecular Biology 
3:407-418 (1984); Rogers, J. Biol Chem. 260:3731-3738 (1985); Rothstein et al., Gene 55:353-356 
(1987); Whittier et al., Nucleic Acids Research 15:2515-2535 (1987); Wirsel et al., Molecular 
Microbiology 3:3-14 (1989); Yu et al., Gene 122:247-253 (1992). A description of the regulation 
of plant gene expression by the phytohormone, gibberellic acid and secreted enzymes induced by 

30 gibberellic acid can be found in R.L. Jones and J. MacMillin, Gibberellins: in: Advanced Plant 
Physiology,. Malcolm B. Wilkins, ed., 1984 Pitman Publishing Limited, London, pp. 21-52. 



WO 99/24578 PCT/IB98/01665 

-15- 

References that describe other metabolically-regulated genes: Sheen, Plant Cell, 2:1027- 
1038(1990); Maas et al., EMBOJ. 9:3447-3452 (1990); Benkel and Hickey, Proc. Natl. Acad. Sci. 
84:1337-1339(1987) 

Typically, using techniques known in the art, a desired polynucleotide sequence is inserted into an 
5 expression cassette comprising genetic regulatory elements designed for operation in plants. The 
expression cassette is inserted into a desired expression vector with companion sequences upstream 
and downstream from the expression cassette suitable for expression in a plant host. The 
companion sequences will be of plasmid or viral origin and provide necessary characteristics to the 
vector to permit the vectors to move DNA from an original cloning host, such as bacteria, to the 

1 0 desired plant host. The basic bacterial/plant vector construct will preferably provide a broad host 
range prokaryote replication origin; a prokaryote selectable marker; and, for Agrobacterium 
transformations, T DNA sequences for Agrobacterium-mediated transfer to plant chromosomes. 
Where the heterologous gene is not readily amenable to detection, the construct will preferably also 
have a selectable marker gene suitable for determining if a plant cell has been transformed. A 

15 general review of suitable markers, for example for the members of the grass family, is found in 
Wilmink and Dons, 1993, Plant Mol. Biol. Reptr, 11(2):165-185. 

Sequences suitable for permitting integration of the heterologous sequence into the plant genome 
are also recommended. These might include transposon sequences and the like for homologous 
recombination as well as Ti sequences which permit random insertion of a heterologous expression 
20 cassette into a plant genome. Suitable prokaryote selectable markers include resistance toward 
antibiotics such as ampicillin or tetracycline. Other DNA sequences encoding additional functions 
may also be present in the vector, as is known in the art. 

The nucleic acid molecules of the subject invention may be included into an expression cassette 
for expression of the protein(s) of interest. Usually, there will be only one expression cassette, 
25 although two or more are feasible. The recombinant expression cassette will contain in addition 
to the heterologous protein encoding sequence the following elements, a promoter region, plant 5' 
untranslated sequences, initiation codon depending upon whether or not the structural gene comes 
equipped with one, and a transcription and translation termination sequence. Unique restriction 
enzyme sites at the 5' and 3' ends of the cassette allow for easy insertion into a pre-existing vector. 
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A heterologous coding sequence may be for any protein relating to the present invention. The 
sequence encoding the protein of interest will encode a signal peptide which allows processing and 
translocation of the protein, as appropriate, and will usually lack any sequence which might result 
in the binding of the desired protein of the invention to a membrane. Since, for the most part, the 

5 transcriptional initiation region will be for a gene which is expressed and translocated during 
germination, by employing the signal peptide which provides for translocation, one may also 
provide for translocation of the protein of interest. In this way, the protein(s) of interest will be 
translocated from the cells in which they are expressed and may be efficiently harvested. Typically 
secretion in seeds are across the aleurone or scutellar epithelium layer into the endosperm of the 

10 seed. While it is not required that the protein be secreted from the cells in which the protein is 
produced, this facilitates the isolation and purification of the recombinant protein. 

Since the ultimate expression of the desired gene product will be in a eucaiyotic cell it is desirable 
to determine whether any portion of the cloned gene contains sequences which will be processed 
out as introns by the host's splicosome machinery. If so, site-directed mutagenesis of the "intront 
15 region may be conducted to prevent losing a portion of the genetic message as a false intron cod4, 
Reed and Maniatis, Cell 41:95-105, 1985. 

The vector can be microinjected directly into plant cells by use of micropipettes to mechanically 
transfer the recombinant DNA. Crossway, Mol Gen. Genet, 202:179-185, 1985. The genetic 
material may also be transferred into the plant cell by using polyethylene glycol, Krens, et al., 

20 Nature, 296, 72-74, 1982. Another method of introduction of nucleic acid segments is high 
velocity ballistic penetration by small particles with the nucleic acid either within the matrix of 
small beads or particles, or on the surface, Klein, et al., Nature, 327, 70-73, 1987 and Knudsen and 
Muller, 1991, Planta, 185:330-336 teaching particle bombardment of barley endosperm to create 
transgenic barley. Yet another method of introduction would be fusion of protoplasts with other 

25 entities, either minicells, cells, lysosomes or other fusible lipid-surfaced bodies, Fraley, et al., Proc. 
Natl. Acad. Sci. USA, 79, 1859-1863, 1982. 

The vector may also be introduced into the plant cells by electroporation. (Fromm et al., Proc. Natl 
Acad. Sci. USA 82:5824, 1985). In this technique, plant protoplasts are electroporated in the 
presence of plasmids containing the gene construct. Electrical impulses of high field strength 
30 reversibly permeabilize biomembranes allowing the introduction of the plasmids. Electroporated 
plant protoplasts reform the cell wall, divide, and form plant callus. 
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All plants from which protoplasts can be isolated and cultured to give whole regenerated plants can 
be transformed by the present invention so that whole plants are recovered which contain the 
transferred gene. It is known that practically all plants can be regenerated from cultured cells or 
tissues, including but not limited to all major species of sugarcane, sugar beet, cotton, fruit and 

5 other trees, legumes and vegetables. Some suitable plants include, for example, species from the 
genera Fragaria, Lotus, Medicago, Onobrychis, Trifolium, Trigonella, Vigna, Citrus, Linum, 
Geranium, Manihot, Daucus, Arabidopsis, Brassica, Raphanus, Sinapis, Atropa, Capsicum, 
Datura, Hyoscyamus, Lycopersion, Nicotiana, Solanum, Petunia, Digitalis, Majorana, Cichorium, 
Helianthus, Lactuca, Bromus, Asparagus, Antirrhinum, Hererocallis, Nemesia, Pelargonium, 

10 Panicum, Pennisetum, Ranunculus, Senecio, Salpiglossis, Cucumis, Browaalia, Glycine, Lolium, 
Zea, Triticum, Sorghum, and Datura. 

j 

Means for regeneration vary from species to species of plants, but generally a suspension of 
transformed protoplasts containing copies of the heterologous gene is first provided. Callus tissue 
is formed and shoots may be induced from callus and subsequently rooted. Alternatively, embryjo 

15 formation can be induced from the protoplast suspension. These embryos germinate as natural 
embryos to form plants. The culture media will generally contain various amino acids and 
hormones, such as auxin and cytokinins. It is also advantageous to add glutamic acid and proline 
to the medium, especially for such species as corn and alfalfa. Shoots and roots normally develop 
simultaneously. Efficient regeneration will depend on the medium, on the genotype, and on the 

20 history of the culture. If these three variables are controlled, then regeneration is fully reproducible 
and repeatable. 

In some plant cell culture systems, the desired protein of the invention may be excreted or 
alternatively, the protein may be extracted from the whole plant. Where the desired protein of the 
invention is secreted into the medium, it may be collected. Alternatively, the embryos and 
25 embryoless-half seeds or other plant tissue may be mechanically disrupted to release any secreted 
protein between cells and tissues. The mixture may be suspended in a buffer solution to retrieve 
soluble proteins. Conventional protein isolation and purification methods will be then used to 
purify the recombinant protein. Parameters of time, temperature pH, oxygen, and volumes will be 
adjusted through routine methods to optimize expression and recovery of heterologous protein. 
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iv ttarierial Systems 

Bacterial expression techniques are known in the art. A bacterial promoter is any DNA sequence 
capable of binding bacterial RNA polymerase and initiating the downstream (3') transcription of 
a coding sequence (eg. structural gene) into mRNA. A promoter will have a transcription initiation 
5 region which is usually placed proximal to the 5' end of the coding sequence. This transcription 
initiation region usually includes an RNA polymerase binding site and a transcription initiation site. 
A bacterial promoter may also have a second domain called an operator, that may overlap an 
adjacent RNA polymerase binding site at which RNA synthesis begins. The operator permits 
negative regulated (inducible) transcription, as a gene repressor protein may bind the operator and 
10 thereby inhibit transcription of a specific gene. Constitutive expression may occur in the absence 
of negative regulatory elements, such as the operator. In addition, positive regulation may be 
achieved by a gene activator protein binding sequence, which, if present is usually proximal (5^ 
to the RNA polymerase binding sequence. An example of a gene activator protein is the catabolite 
activator protein (CAP), which helps initiate transcription of the lac operon in Escherichia coli (E. 
15 coli) [Raibaud et al. (1984) Annu. Rev. Genet. 75:173]. Regulated expression may therefore be 
either positive or negative, thereby either enhancing or reducing transcription. 

Sequences encoding metabolic pathway enzymes provide particularly useful promoter sequences. 
Examples include promoter sequences derived from sugar metabolizing enzymes, such as galactose, 
lactose (lac) [Chang et al. (1977) Nature 795:1056], and maltose. Additional examples include 

20 promoter sequences derived from biosynthetic enzymes such as tryptophan (trp) [Goeddel et al. 
(1980) Nuc. Acids Res. 5:4057; Yelverton et al. (1981) Nucl. Acids Res. 9:731; US 
patent 4,738,921; EP-A-0036776 and EP-A-0121775]. The g-laotamase (bla) promoter system 
[Weissmann (1981) "The cloning of interferon and other mistakes." ^Interferon 3 (ed. I. Grosser)], 
bacteriophage lambda PL [Shimatake et al (1981) Nature 292:128] and T5 [US patent 4,689,406] 

25 promoter systems also provide useful promoter sequences. 

In addition, synthetic promoters which do not occur in nature also function as bacterial promoters. 
For example, transcription activation sequences of one bacterial or bacteriophage promoter may 
be joined with the operon sequences of another bacterial or bacteriophage promoter, creating a 
synthetic hybrid promoter [US patent 4,551 ,433]. For example, the tac promoter is a hybrid trp-lac 
30 promoter comprised of both trp promoter and lac operon sequences that is regulated by the lac 
repressor [Amann et al. (1983) Gene 25:167; de Boer et al. (1983) Proc. Natl. Acad. Sci. 80:21]. 



WO 99/24578 PCI7IB9 8/0 1665 

Furthermore, a bacterial promoter can include naturally occurring promoters of non-bacterial origin 
that have the ability to bind bacterial RNA polymerase and initiate transcription. A naturally 
occurring promoter of non-bacterial origin can also be coupled with a compatible RNA polymerase 
to produce high levels of expression of some genes in prokaryotes. The bacteriophage T7 RNA 
5 polymerase/promoter system is an example of a coupled promoter system [Studier et al. (1986) J. 
Mol. Biol. 189:113; Tabor et al. (1985) Proc Natl. Acad. Sci. 52:1074]. In addition, a hybrid 
promoter can also be comprised of a bacteriophage promoter and an E. coli operator region (EPO- 
A-0267 851). 

In addition to a functioning promoter sequence, an efficient ribosome binding site is also useful for 
10 the expression of foreign genes in prokaryotes. In E. coli, the ribosome binding site is called the 
Sbine-Dalgarno (SD) sequence and includes an initiation codon (ATG) and a sequence 3-9 
nucleotides in length located 3-1 1 nucleotides upstream of the initiation codon [Shine et al. (1975) 
Nature 254:34]. The SD sequence is thought to promote binding of mRNA to the ribosome by the 
pairing of bases between the SD sequence and the 3' and of E. coli 16S rRNA [Steitz et al. (1979j) 
15 "Genetic signals and nucleotide sequences in messenger RNA." Li Biological Regulation and 
Development: Gene Expression (ed. R.F. Goldberger)]. To express eukaryotic genes and 
prokaryotic genes with weak ribosome-binding site [Sambrook et al. (1989) "Expression of cloned 
genes in Escherichia coli." In Molecular Cloning: A Laboratory Manual]. 

A DNA molecule may be expressed intracellularly. A promoter sequence may be directly linked 
20 with the DNA molecule, in which case the first amino acid at the N-terminus will always be a 
methionine, which is encoded by the ATG start codon. If desired, methionine at the N-terminus 
may be cleaved from the protein by in vitro incubation with cyanogen bromide or by either in vivo 
on in vitro incubation with a bacterial methionine N-terminal peptidase (EPO-A-0 219 237). 

Fusion proteins provide an alternative to direct expression. Usually, a DNA sequence encoding the 
25 N-terminal portion of an endogenous bacterial protein, or other stable protein, is fused to the 5' end 
of heterologous coding sequences. Upon expression, this construct will provide a fusion of the two 
amino acid sequences. For example, the bacteriophage lambda cell gene can be linked at the 5' 
terminus of a foreign gene and expressed in bacteria. The resulting fusion protein preferably retains 
a site for a processing enzyme (factor Xa) to cleave the bacteriophage protein from the foreign gene 
30 [Nagai et al. (1984) Nature 309:810]. Fusion proteins can also be made with sequences from the 
lacZ [Jia et al. (1987) Gene 50:197], trpE [Allen et al. (1987) J. Biotechnol. 5:93; Makoff et al. 
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(1989) J. Gen. Microbiol 135:111 and Chey [EP-A-0 324 647] genes. The DNA sequence at the 
junction of the two amino acid sequences may or may not encode a cleavable site. Another example 
is a ubiquitin fusion protein. Such a fusion protein is made with the ubiquitin region that preferably 
retains a site for a processing enzyme (eg. ubiquitin specific processing-protease) to cleave the 
5 ubiquitin from the foreign protein. Through this method, native foreign protein can be isolated 
[Miller et al. (1989) Bio/Technology 7:698]. 

Alternatively, foreign proteins can also be secreted from the cell by creating chimeric DNA molecules 
that encode a fusion protein comprised of a signal peptide sequence fragment that provides for secretion 
of the foreign protein in bacteria [US patent 4,336,336]. The signal sequence fragment usually encodes 
10 a signal peptide comprised of hydrophobic amino acids which direct the secretion of the protein from the 
cell. The protein is either secreted into the growth media (gram-positive bacteria) or into the periplasmic 
space, located between the inner and outer membrane of the cell (gram-negative bacteria). Preferably 
there are processing sites, which can be cleaved either in vivo or in vitro encoded between the signal 
peptide fragment and the foreign gene. 

15 DNA encoding suitable signal sequences can be derived from genes for secreted bacterial proteins, 
such as the E. coli outer membrane protein gene (ompA) [Masui et al (1983), in: Experimental 
Manipulation of Gene Expression; Ghrayeb et al (1984) EMBO J. 3:2437] and the E. coli alkaline 
phosphatase signal sequence (phoA) [Oka et al (1985) Proc. Natl Acad. Set 52:7212]. As an 
additional example, the signal sequence of the alpha-amylase gene from various Bacillus strains 

20 can be used to secrete heterologous proteins from B. subtilis [Palva et al (1 982) Proc, Natl Acad. 
Sci. USA 79:5582; EP-A-0 244 042]. 

Usually, transcription termination sequences recognized by bacteria are regulatory regions located 
3* to the translation stop codon, and thus together with the promoter flank the coding sequence. 
These sequences direct the transcription of an mRNA which can be translated into the polypeptide 
25 encoded by the DNA. Transcription termination sequences frequently include DNA sequences of 
about 50 nucleotides capable of forming stem loop structures that aid in terminating transcription. 
Examples include transcription termination sequences derived from genes with strong promoters, 
such as the trp gene in E. coli as well as other biosynthetic genes. 

Usually, the above described components, comprising a promoter, signal sequence (if desired), 
30 coding sequence of interest, and transcription termination sequence, are put together into expression 
constructs. Expression constructs are often maintained in a replicon, such as an extrachromosomal 
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element {eg. plasmids) capable of stable maintenance in a host, such as bacteria. The replicon will 
have a replication system, thus allowing it to be maintained in a prokaryotic host either for 
expression or for cloning and amplification. In addition, a replicon may be either a high or low 
copy number plasmid. A high copy number plasmid will generally have a copy number ranging 
5 from about 5 to about 200, and usually about 10 to about 150. A host containing a high copy 
number plasmid will preferably contain at least about 10, and more preferably at least about 20 
plasmids. Either a high or low copy number vector may be selected, depending upon the effect of 
the vector and the foreign protein on the host. 

Alternatively, the expression constructs can be integrated into the bacterial genome with an 
10 integrating vector. Integrating vectors usually contain at least one sequence homologous to the 
bacterial chromosome that allows the vector to integrate. Integrations appear to result from 
recombinations between homologous DNA in the vector and the bacterial chromosome. For 
example, integrating vectors constructed with DNA from various Bacillus strains integrate into the 
Bacillus chromosome (EP-A- 0 127 328). Integrating vectors may also be comprised of 
15 bacteriophage or transposon sequences. 

Usually, extrachromosomal and integrating expression constructs may contain selectable markers 
to allow for the selection of bacterial strains that have been transformed. Selectable markers can 
be expressed in the bacterial host and may include genes which render bacteria resistant to drugs 
such as ampicillin, chloramphenicol, erythromycin, kanamycin (neomycin), and tetracycline 
20 [Davies et al (1978) Annu. Rev. Microbiol 52:469]. Selectable markers may also include 
biosynthetic genes, such as those in the histidine, tryptophan, and leucine biosynthetic pathways. 

Alternatively, some of the above described components can be put together in transformation 
vectors. Transformation vectors are usually comprised of a selectable market that is either 
maintained in a replicon or developed into an integrating vector, as described above. 

25 Expression and transformation vectors, either extra-chromosomal replicons or integrating vectors, 
have been developed for transformation into many bacteria. For example, expression vectors have 
been developed for, inter alia, the following bacteria: Bacillus subtilis [Palva et al (1982) Proc. 
Natl Acad. Sci. USA 79:5582; EP-A-0 036 259 and EP-A-0 063 953; WO 84/04541], Escherichia 
coli [Shimatake et al (1981) Nature 292:128; Amann et al (1985) Gene 40:183; Studier et al 

30 (1986) 7. Mol Biol 189:113; EP-A-0 036 776,EP-A-0 136 829 and EP-A-0 136 907], 
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Streptococcus cremoris [Powell et al. (1988) AppL Environ. Microbiol. 54:655]; Streptococcus 
lividans [Powell et al (1988) AppL Environ. Microbiol 54:655], Streptomyces lividans [US patent 
4,745,056]. 

Methods of introducing exogenous DNA into bacterial hosts are well-known in the art, and usually 

5 include either the transformation of bacteria treated with CaCl 2 or other agents, such as divalent 
cations and DMSO. DNA can also be introduced into bacterial cells by electroporation. 
Transformation procedures usually vary with the bacterial species to be transformed. See eg. 
[Masson et al (1989) FEMS Microbiol Lett. 60:273; Palva et al. (1982) Proa Natl Acad. Sci. USA 
79:5582; EP-A-0 036 259 and EP-A-0 063 953; WO 84/04541, Bacillus], [Miller et al. (1988) 

10 Proc. Natl. Acad. Sci. 55:856; Wang et al. (1990) /. Bacteriol. 772:949, Campylobacter], [Cohen 
et al. (1973) Proc. Natl. Acad. Sci. 69:2110; Dower et al. (1988) Nucleic Acids Res. 76:6127; 
Kushner (1978) "An improved method for transformation of Escherichia coli with ColEl-derived 
plasmids. In Genetic Engineering: Proceedings of the International Symposium on Genetic 
Engineering (eds. H.W. Boyer and S. Nicosia); Mandel et al. (1970) J. Mol Biol. 53:159; Taketp 

15 (1988) Biochim. Biophys. Acta 949:31%; Escherichia], [Chassy et al. (1987) FEMS Microbiol Lett. 
44:173 Lactobacillus]; [Fiedler et al. (1988) Anal. Biochem 770:38, Pseudomonas]; [Augustin et 
al. (1990) FEMS Microbiol. Lett. 66:203, Staphylococcus], [Barany et al. (1980) J. Bacteriol. 
144:69%; Harlander (1987) "Transformation of Streptococcus lactis by electroporation, in: 
Streptococcal Genetics (ed. J. Ferretti and R. Curtiss HI); Perry et al. (1981) Infect. Immun. 

20 52:1295; Powell et al. (1988) AppL Environ. Microbiol 54:655; Somkuti et al. (1987) Proc. 4th 
Evr. Cong. Biotechnology 7:412, Streptococcus]. 
v. Yeast Expression 

Yeast expression systems are also known to one of ordinary skill in the art A yeast promoter is any 
DNA sequence capable of binding yeast RNA polymerase and initiating the downstream (3 1 ) 

25 transcription of a coding sequence (eg. structural gene) into mRNA. A promoter will have a 
transcription initiation region which is usually placed proximal to the 5' end of the coding sequence. 
This transcription initiation region usually includes an RNA polymerase binding site (the "TATA 
Box") and a transcription initiation site. A yeast promoter may also have a second domain called 
an upstream activator sequence (UAS), which, if present, is usually distal to the structural gene. 

30 The UAS permits regulated (inducible) expression. Constitutive expression occurs in the absence 
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of a UAS. Regulated expression may be either positive or negative, thereby either enhancing or 
reducing transcription. 

Yeast is a fermenting organism with an active metabolic pathway, therefore sequences encoding 
enzymes in the metabolic pathway provide particularly useful promoter sequences. Examples 
5 include alcohol dehydrogenase (ADH) (EP-A-0 284 044), enolase, glucokinase, glucose-6- 
phosphate isomerase, glyceraldehyde-3-phosphate-dehydrogenase (GAP or GAPDH), hexokinase, 
phosphofructokinase, 3-phosphoglycerate mutase, and pyruvate kinase (PyK) (EPO-A-0 329 203). 
The yeast PH05 gene, encoding acid phosphatase, also provides useful promoter sequences 
[Myanohara et al. (1983) Proc. Natl. Acad. Sci. USA 80:1]. 

1 0 In addition, synthetic promoters which do not occur in nature also function as yeast promoters. For 
example, UAS sequences of one yeast promoter may be joined with the transcription activation 
region of another yeast promoter, creating a synthetic hybrid promoter. Examples of such hybrid 
promoters include the ADH regulatory sequence linked to the GAP transcription activation region 
(US Patent Nos. 4,876,197 and 4,880,734). Other examples of hybrid promoters include promoters 

1 5 which consist of the regulatory sequences of either the ADH2, GAL4, GAL JO, OR PHO 5 genes, 
combined with the transcriptional activation region of a glycolytic enzyme gene such as GAP or 
PyK (EP-A-0 164 556). Furthermore, a yeast promoter can include naturally occurring promoters 
of non-yeast origin that have the ability to bind yeast RNA polymerase and initiate transcription. 
Examples of such promoters include, inter alia, [Cohen et al. (1980) Proc. Natl. Acad. Sci. USA 

20 77:1078; Henikoffer al. (1981) Nature 255:835; Hollenberg et al. (1981) Curr. Topics Microbiol. 
Immunol. 96:119; Hollenberg et al. (1979) "The Expression of Bacterial Antibiotic Resistance 
Genes in the Yeast Saccharomyces cerevisiae," in: Plasmids of Medical, Environmental and 
Commercial Importance (eds. K.N. Tirnmis and A. Puhler); Mercerau-Puigalon etal. (1980) Gene 
11:163; Panthier et al. (1980) Curr. Genet. 2:109;]. 

25 A DNA molecule may be expressed intracellularly in yeast. A promoter sequence may be directly 
linked with the DNA molecule, in which case the first amino acid at the N-terminus of the 
recombinant protein will always be a methionine, which is encoded by the ATG start codon. If 
desired, methionine at the N-terminus may be cleaved from the protein by in vitro incubation with 
cyanogen bromide. 
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Fusion proteins provide an alternative for yeast expression systems, as well as in mammalian, 
baculovirus, and bacterial expression systems. Usually, a DNA sequence encoding the N-terminal 
portion of an endogenous yeast protein, or other stable protein, is fused to the 5' end of 
heterologous coding sequences. Upon expression, this construct will provide a fusion of the two 

5 amino acid sequences. For example, the yeast or human superoxide dismutase (SOD) gene, can be 
linked at the 5* terminus of a foreign gene and expressed in yeast. The DNA sequence at the 
junction of the two amino acid sequences may or may not encode a cleavable site. See eg. EP-A-0 
196 056. Another example is a ubiquitin fusion protein. Such a fusion protein is made with the 
ubiquitin region that preferably retains a site for a processing enzyme (eg. ubiquitin-specific 

10 processing protease) to cleave the ubiquitin from the foreign protein. Through this method, 
therefore, native foreign protein can be isolated (eg. W08 8/024066). 

Alternatively, foreign proteins can also be secreted from the cell into the growth media by creating 
chimeric DNA molecules that encode a fusion protein comprised of a leader sequence fragment that 
provide for secretion in yeast of the foreign protein. Preferably, there are processing sites encodek 
1 5 between the leader fragment and the foreign gene that can be cleaved either in vivo or in vitro. Tile 
leader sequence fragment usually encodes a signal peptide comprised of hydrophobic amino acids 
which direct the secretion of the protein from the cell. 

DNA encoding suitable signal sequences can be derived from genes for secreted yeast proteins, 
such as the yeast invertase gene (EP-A-0 012 873; JPO. 62,096,086) and the A-factor gene (US 
20 patent 4,588,684). Alternatively, leaders of non-yeast origin, such as an interferon leader, exist that 
also provide for secretion in yeast (EP-A-0 060 057). 

A preferred class of secretion leaders are those that employ a fragment of the yeast alpha-factor 
gene, which contains both a "pre" signal sequence, and a "pro" region. The types of alpha-factor 
fragments that can be employed include the full-length pre-pro alpha factor leader (about 83 amino 
25 acid residues) as well as truncated alpha-factor leaders (usually about 25 to about 50 amino acid 
residues) (US Patents 4,546,083 and 4,870,008; EP-A-0 324 274). Additional leaders employing 
an alpha-factor leader fragment that provides for secretion include hybrid alpha-factor leaders made 
with a presequence of a first yeast, but a pro-region from a second yeast alphafactor. (eg. see WO 
89/02463.) 



WO 99/24578 PCT/IB98/01665 

-25- 

Usually, transcription termination sequences recognized by yeast are regulatory regions located 3* 
to the translation stop codon, and thus together with the promoter flank the coding sequence. These 
sequences direct the transcription of an mRNA which can be translated into the polypeptide 
encoded by the DNA. Examples of transcription terminator sequence and other yeast-recognized 
5 termination sequences, such as those coding for glycolytic enzymes. 

Usually, the above described components, comprising a promoter, leader (if desired), coding 
sequence of interest, and transcription termination sequence, are put together into expression 
constructs. Expression constructs are often maintained in a replicon, such as an extrachromosomal 
element {eg. plasmids) capable of stable maintenance in a host, such as yeast or bacteria. The 

10 replicon may have two replication systems, thus allowing it to be maintained, for example, in yeast 
for expression and in a prokaryotic host for cloning and amplification. Examples of such yeast- 
bacteria shuttle vectors include YEp24 [Botstein et al. (1979) Gene 5:17-24], pCl/1 [Brake et al. 
(1984) Proc. Natl. Acad. Sci USA 57:4642-4646], and YRpl7 [Stinchcomb et al (1982) /. Mol. 
Biol 158:151]. In addition, a replicon may be either a high or low copy number plasmid. A high 

1 5 copy number plasmid will generally have a copy number ranging from about 5 to about 200, and 
usually about 1 0 to about 1 50. A host containing a high copy number plasmid will preferably have 
at least about 10, and more preferably at least about 20. Enter a high or low copy number vector 
may be selected, depending upon the effect of the vector and the foreign protein on the host. See 
eg. Brake et al, supra. 

20 Alternatively, the expression constructs can be integrated into the yeast genome with an integrating 
vector. Integrating vectors usually contain at least one sequence homologous to a yeast 
chromosome that allows the vector to integrate, and preferably contain two homologous sequences 
flanking the expression construct. Integrations appear to result from recombinations between 
homologous DNA in the vector and the yeast chromosome [Orr- Weaver et al (1983) Methods in 

25 Enzymol 707:228-245]. An integrating vector may be directed to a specific locus in yeast by 
selecting the appropriate homologous sequence for inclusion in the vector. See Orr-Weaver et al, 
supra. One or more expression construct may integrate, possibly affecting levels of recombinant 
protein produced [Rine et al. (1983) Proc. Natl Acad. ScL USA 80:6750]. The chromosomal 
sequences included in the vector can occur either as a single segment in the vector, which results 

30 in the integration of the entire vector, or two segments homologous to adjacent segments in the 
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chromosome and flanking the expression construct in the vector, which can result in the stable 
integration of only the expression construct. 

Usually, extrachromosomal and integrating expression constructs may contain selectable markers 
to allow for the selection of yeast strains that have been transformed. Selectable markers may 
include biosynthetic genes that can be expressed in the yeast host, such as ADE2, HIS4, LEU2 9 
TRP1 9 and ALG7, and the G418 resistance gene, which confer resistance in yeast cells to 
tunicamycin and G418, respectively. In addition, a suitable selectable marker may also provide 
yeast with the ability to grow in the presence of toxic compounds, such as metal. For example, the 
presence of CUP1 allows yeast to grow in the presence of copper ions [Butt et al (1987) Microbiol, 
Rev. 57:351]. 

Alternatively, some of the above described components can be put together into transformation 
vectors. Transformation vectors are usually comprised of a selectable marker that is either 
maintained in a replicon or developed into an integrating vector, as described above. 

i 

Expression and transformation vectors, either extrachromosomal replicons or integrating vectors, 
have been developed for transformation into many yeasts. For example, expression vectors have 
been developed for, inter alia, the following yeasts: Candida albicans [Kurtz, et al (1986) Mol 
Cell Biol 5:142], Candida maltosa [Kunze, etal (1985)/. Basic Microbiol 25:141]. Hansenula 
polymorpha [Gleeson, et al (1986) J. Gen. Microbiol 752:3459; Roggenkamp et al (1986) Mol 
Gen. Genet 202:302], Kluyveromyces fragilis [Das, et al (1984) J. Bacteriol 755:1165], 
Kluyveromyces lactis [De Louvencourt et al (1983) J. Bacteriol 154:737; Van den Berg et al 
(1990) Bio/Technology 5:135], Pichia guillerimondii [Kunze et al (1985) J. Basic Microbiol 
25:141], Pichia pastoris [Cregg, et al (1985) Mol Cell Biol 5:3376; US Patent Nos. 4,837,148 
and 4,929,555], Saccharomyces cerevisiae [Hinnen et al (1978) Proa Natl Acad. ScL USA 
75:1929; Ito et al. (1983) /. Bacteriol 753:163], Schizosaccharomyces pombe [Beach and Nurse 
(1981) Nature 300:706], and Yarrowia lipolytica [Davidow, et al. (1985) Curr. Genet. 70:380471 
Gaillardin, etal (1985) Curr. Genet. 70:49]. 

Methods of introducing exogenous DNA into yeast hosts are well-known in the art, and usually 
include either the transformation of spheroplasts or of intact yeast cells treated with alkali cations. 
Transformation procedures usually vary with the yeast species to be transformed. See eg. [Kurtz 
et al. (1986) Mol Cell. Biol. 6:142; Kunze et al (1985) J. Basic Microbiol 25:141; Candida]; 
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[Gleeson et al (1986) J. Gen. Microbiol 752:3459; Roggenkamp et al (1986) Mol Gen. Genet 
202:302; Hansenula]; [Das et al (1984) J. BacterioL 158:1 165; De Louvencourt et al (1983) /. 
Bacteriol 754:1165; Van den Berg et al (1990) Bio/Technology 5:135; Kluyveromyces]; [Cregg 
e/ a/. (1985) Mol Cell Biol 5:3376; Kunze et al (1985) J. Aufc Microbiol 25:141; US Patent 
5 Nos. 4,837,148 and 4,929,555; Pichia]; [Hinnen et al (1978) TVoc. Natl Acad. Sci. USA 75;1929; 
Ito et al (1983) J. Bacteriol 755:163 Saccharomyces]; [Beach and Nurse (1981) Nature 300:706; 
Schizosaccharomyces]; [Davidow et al (1985) Curr. Genet. 70:39; Gaillardin et al (1985) Curr. 
Genet 70:49; Yarrowia]. 

Antibodies 

10 As used herein, the term "antibody" refers to a polypeptide or group of polypeptides composed of 
at least one antibody combining site. An "antibody combining site" is the three-dimensional 
binding space with an internal surface shape and charge distribution complementary to the features 
of an epitope of an antigen, which allows a binding of the antibody with the antigen. "Antibody" 
includes, for example, vertebrate antibodies, hybrid antibodies, chimeric antibodies, humanised 

15 antibodies, altered antibodies, univalent antibodies, Fab proteins, and single domain antibodies. 

Antibodies against the proteins of the invention are useful for affinity chromatography, 
immunoassays, and distinguishing/identifying Neisserial proteins. 

Antibodies to the proteins of the invention, both polyclonal and monoclonal, may be prepared by 
conventional methods. In general, the protein is first used to immunize a suitable animal, preferably 

20 a mouse, rat, rabbit or goat. Rabbits and goats are preferred for the preparation of polyclonal sera 
due to the volume of serum obtainable, and the availability of labeled anti-rabbit and anti-goat 
antibodies. Immunization is generally performed by mixing or emulsifying the protein in saline, 
preferably in an adjuvant such as Freund's complete adjuvant, and injecting the mixture or 
emulsion parenterally (generally subcutaneously or intramuscularly). A dose of 50-200 ^ig/injection 

25 is typically sufficient. Immunization is generally boosted 2-6 weeks later with one or more 
injections of the protein in saline, preferably using Freund's incomplete adjuvant. One may 
alternatively generate antibodies by in vitro immunization using methods known in the art, which 
for the purposes of this invention is considered equivalent to in vivo immunization. Polyclonal 
antisera is obtained by bleeding the immunized animal into a glass or plastic container, incubating 

30 the blood at 25°C for one hour, followed by incubating at 4°C for 2-18 hours. The serum is 
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recovered by centrifugation (eg. 1 ,000g for 10 minutes). About 20-50 ml per bleed may be obtained 
from rabbits. 

Monoclonal antibodies are prepared using the standard method of Kohler & Milstein [Nature 
(1975) 256:495-96], or a modification thereof. Typically, a mouse or rat is immunized as described 
above. However, rather than bleeding the animal to extract serum, the spleen (and optionally 
several large lymph nodes) is removed and dissociated into single cells. If desired, the spleen cells 
may be screened (after removal of nonspecifically adherent cells) by applying a cell suspension to 
a plate or well coated with the protein antigen. B-cells expressing membrane-bound 
immunoglobulin specific for the antigen bind to the plate, and are not rinsed away with the rest of 
the suspension. Resulting B-cells, or all dissociated spleen cells, are then induced to fuse with 
myeloma cells to form hybridomas, and are cultured in a selective medium (eg. hypoxanthine, 
aminopterin, thymidine medium, "HAT*). The resulting hybridomas are plated by limiting dilution, 
and are assayed for the production of antibodies which bind specifically to the immunizing antigen 
(and which do not bind to unrelated antigens). The selected MAb-secreting hybridomas are thein 
cultured either in vitro (eg. in tissue culture bottles or hollow fiber reactors), or in vivo (as ascitis 
in mice). 

If desired, the antibodies (whether polyclonal or monoclonal) may be labeled using conventional 
techniques. Suitable labels include fluorophores, chromophores, radioactive atoms (particularly 32 P 
and 125 I), electron-dense reagents, enzymes, and ligands having specific binding partners. Enzymes 
are typically detected by their activity. For example, horseradish peroxidase is usually detected by its 
ability to convert 3,3 , ,5,5'-tetramethylbenzidine (TMB) to a blue pigment, quantifiable with a 
spectrophotometer. "Specific binding partner" refers to a protein capable of binding a ligand molecule 
with high specificity, as for example in the case of an antigen and a monoclonal antibody specific 
therefor. Other specific binding partners include biotin and avidin or streptavidin, IgG and protein A, 
and the numerous receptor-ligand couples known in the art. It should be understood that the above 
description is not meant to categorize the various labels into distinct classes, as the same label may 
serve in several different modes. For example, l25 I may serve as a radioactive label or as an 
electron-dense reagent. HRP may serve as enzyme or as antigen for a MAb. Further, one may combine 
various labels for desired effect. For example, MAbs and avidin also require labels in the practice of 
this invention: thus, one might label a MAb with biotin, and detect its presence with avidin labeled 
with ,25 I, or with an anti-biotin MAb labeled with HRP. Other permutations and possibilities will be 
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readily apparent to those of ordinary skill in the art, and are considered as equivalents within the scope 
of the instant invention. 

Pharmaceutical Compositions 

Pharmaceutical compositions can comprise either polypeptides, antibodies, or nucleic acid of the 
5 invention. The pharmaceutical compositions will comprise a therapeutically effective amount of 
either polypeptides, antibodies, or polynucleotides of the claimed invention. 

The term "therapeutically effective amount" as used herein refers to an amount of a therapeutic 
agent to treat, ameliorate, or prevent a desired disease or condition, or to exhibit a detectable 
therapeutic or preventative effect. The effect can be detected by, for example, chemical markers or 

10 antigen levels. Therapeutic effects also include reduction in physical symptoms, such as decreased 
body temperature. The precise effective amount for a subject will depend upon the subject's size 
and health, the nature and extent of the condition, and the therapeutics or combination of 
therapeutics selected for a<miimstration. Thus, it is not useful to specify an exact effective amount 
in advance. However, the effective amount for a given situation can be determined by routine 

1 5 experimentation and is within the judgement of the clinician. 

For purposes of the present invention, an effective dose will be from about 0.01 mg/ kg to 50 mg/kg 
or 0.05 mg/kg to about 10 mg/kg of the DNA constructs in the individual to which it is administered. 

A pharmaceutical composition can also contain a pharmaceutically acceptable carrier. The term 
"pharmaceutically acceptable carrier" refers to a carrier for administration of a therapeutic agent, such 

20 as antibodies or a polypeptide, genes, and other therapeutic agents. The term refers to any 
pharmaceutical carrier that does not itself induce the production of antibodies harmful to the 
individual receiving the composition, and which may be administered without undue toxicity. Suitable 
carriers may be large, slowly metabolized macromolecules such as proteins, polysaccharides, 
polylactic acids, polyglycolic acids, polymeric amino acids, amino acid copolymers, and inactive virus 

25 particles. Such carriers are well known to those of ordinary skill in the art. 

Pharmaceutically acceptable salts can be used therein, for example, mineral acid salts such as 
hydrochlorides, hydrobromides, phosphates, sulfates, and the like; and the salts of organic acids 
such as acetates, propionates, malonates, benzoates, and the like. A thorough discussion of 
pharmaceutically acceptable excipients is available in Remington's Pharmaceutical Sciences (Mack 
30 Pub.Co.,N.J. 1991). 
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Phaimaceutically acceptable carriers in therapeutic compositions may contain liquids such as water, 
saline, glycerol and ethanol. Additionally, auxiliary substances, such as wetting or emulsifying agents, 
pH buffering substances, and the like, may be present in such vehicles. Typically, the therapeutic 
compositions are prepared as injectables, either as liquid solutions or suspensions; solid forms suitable 
5 for solution in, or suspension in, liquid vehicles prior to injection may also be prepared. Liposomes 
are included within the definition of a pharmaceutically acceptable carrier. 

Delivery Methods 

Once formulated, the compositions of the invention can be administered directly to the subject. The 
subjects to be treated can be animals; in particular, human subjects can be treated. 

10 Direct delivery of the compositions will generally be accomplished by injection, either 
subcutaneously, intraperitoneally, intravenously or intramuscularly or delivered to the interstitial 
space of a tissue. The compositions can also be administered into a lesion. Other modes of 
adininistration include oral and pulmonary administration, suppositories, and transdermal or 
transcutaneous applications (eg. see WO98/20734), needles, and gene guns or hyposprays. Dosage 

1 5 treatment may be a single dose schedule or a multiple dose schedule. 

Vaccines 

Vaccines according to the invention may either be prophylactic (ie. to prevent infection) or 
therapeutic (ie. to treat disease after infection). 

Such vaccines comprise immunising antigen(s), immunogen(s), potypeptide(s), protein(s) or nucleic acid, 
20 usually in combination with "pharmaceutically acceptable carriers," which include any carrier that does 
not itself induce the production of antibodies harmful to the individual receiving die composition. 
Suitable carriers are typically large, slowly metabolized macromolecules such as proteins, 
polysaccharides, polylactic acids, polyglycolic acids, polymeric amino acids, amino acid copolymers, 
lipid aggregates (such as oil droplets or liposomes), and inactive virus particles. Such carriers are well 
25 known to those of ordinary skill in the art. Additionally, these carriers may function as 
immunostimulating agents ("adjuvants"). Furthermore, the antigen or immunogen may be conjugated to 
a bacterial toxoid, such as a toxoid from diphtheria, tetanus, cholera, H. pylori, etc. pathogens. 

Preferred adjuvants to enhance effectiveness of the composition include, but are not limited to: (1) 
aluminum salts (alum), such as aluminum hydroxide, aluminum phosphate, aluminum sulfate, etc; 
30 (2) oil-in-water emulsion formulations (with or without other specific immunostimulating agents 
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such as muxamyl peptides (see below) or bacterial cell wall components), such as for example (a) 
MF59™ (WO 90/14837; Chapter 10 in Vaccine design: the subunit and adjuvant approach, eds. 
Powell & Newman, Plenum Press 1995), containing 5% Squalene, 0.5% Tween 80, and 0.5% Span 
85 (optionally containing various amounts of MTP-PE (see below), although not required) 

5 formulated into submicron particles using a microfluidizer such as Model 1 10Y microfluidizer 
(Microfluidics, Newton, MA), (b) SAF, containing 10% Squalane, 0.4% Tween 80, 5% pluronic- 
blocked polymer L121, and thr-MDP (see below) either microfluidized into a submicron emulsion 
or vortexed to generate a larger particle size emulsion, and (c) Ribi™ adjuvant system (RAS), (Ribi 
hnmunochem, Hamilton, MT) containing 2% Squalene, 0.2% Tween 80, and one or more bacterial 

10 cell wall components from the group consisting of monophosphorylipid A (MPL), trehalose 
dimycolate (TDM), and cell wall skeleton (CWS), preferably MPL + CWS (Detox™); (3) saponin 
adjuvants, such as Stimulon™ (Cambridge Bioscience, Worcester, MA) may be used or particles 
generated therefrom such as ISCOMs (immunostimulating complexes); (4) Complete Freund's 
Adjuvant (CFA) and Incomplete Freund's Adjuvant (IFA); (5) cytokines, such as interleukins (eg. 

15 IL-1, EL-2, EL-4, EL-5, IL-6, IL-7, IL-12, etc.), interferons (eg. gamma interferon), macrophage 
colony stimulating factor (M-CSF), tumor necrosis factor (TNF), etc; and (6) other substances that 
act as immunostimulating agents to enhance the effectiveness of the composition. Alum and 
MF59™ are preferred. 

As mentioned above, muramyl peptides include, but are not limited to, N-acetyl-muramyl-L- 
20 threonyl-D-isoglutamine (thr-MDP), N-acetyl-normuramyl-L-alanyl-D-isoglutamine (nor-MDP), 
N-acetylmuramyl-L-alanyl-D-isoglutamin^ 
hydroxyphosphoryloxy)-ethylamine (MTP-PE), etc. 

The immunogenic compositions (eg. the immunising antigen/immunogen/polypeptide/protein/ 
nucleic acid, pharmaceutically acceptable carrier, and adjuvant) typically will contain diluents, such 
25 as water, saline, glycerol, ethanol, etc. Additionally, auxiliary substances, such as wetting or 
emulsifying agents, pH buffering substances, and the like, may be present in such vehicles. 

Typically, the immunogenic compositions are prepared as injectables, either as liquid solutions or 
suspensions; solid forms suitable for solution in, or suspension in, liquid vehicles prior to injection 
may also be prepared. The preparation also may be emulsified or encapsulated in liposomes for 
30 enhanced adjuvant effect, as discussed above under pharmaceutically acceptable carriers. 
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Immunogenic compositions used as vaccines comprise an immunologically effective amount of the 
antigenic or immunogenic polypeptides, as well as any other of the above-mentioned components, 
as needed. By "immunologically effective amount", it is meant that the administration of that 
amount to an individual, either in a single dose or as part of a series, is effective for treatment or 
prevention. This amount varies depending upon the health and physical condition of the individual 
to be treated, the taxonomic group of individual to be treated (eg. nonhuman primate, primate, etc.), 
the capacity of the individual's immune system to synthesize antibodies, the degree of protection 
desired, the formulation of the vaccine, the treating doctor's assessment of the medical situation, 
and other relevant factors. It is expected that the amount will fall in a relatively broad range that 
can be determined through routine trials. 

The immunogenic compositions are conventionally administered parenterally, eg. by injection, 
either subcutaneously, intramuscularly, or transdermally/transcutaneously (eg. WO98/20734). 
Additional formulations suitable for other modes of administration include oral and pulmonary 
formulations, suppositories, and transdermal applications. Dosage treatment may be a single doie 
schedule or a multiple dose schedule. The vaccine may be administered in conjunction with oth^r 
immunoregulatory agents. 

As an alternative to protein-based vaccines, DNA vaccination may be employed [eg. Robinson & 
Torres (1997) Seminars in Immunology 9:271-283; Donnelly et al. (1997) Annu Rev Immunol 
15:617-648; see later herein]. 

Gene Delivery Vehicles 

Gene therapy vehicles for delivery of constructs including a coding sequence of a therapeutic of 
the invention, to be delivered to the mammal for expression in the mammal, can be administered 
either locally or systemically . These constructs can utilize viral or non-viral vector approaches in 
in vivo or ex vivo modality. Expression of such coding sequence can be induced using endogenous 
mammalian or heterologous promoters. Expression of the coding sequence in vivo can be either 
constitutive or regulated. 

The invention includes gene delivery vehicles capable of expressing the contemplated nucleic acid 
sequences. The gene delivery vehicle is preferably a viral vector and, more preferably, a retroviral* 
adenoviral, adeno-associated viral (AAV), herpes viral, or alphavirus vector. The viral vector can 
also be an astrovirus, coronavirus, orthomyxovirus, papovavirus, paramyxovirus, parvovirus, 
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picomaviius, poxvirus, or togavirus viral vector. See generally, Jolly (1994) Cancer Gene Therapy 
1 :51-64; Kimura (1994) Human Gene Therapy 5:845-852; Connelly (1995) Human Gene Therapy 
6:185-193; and Kapiitt (1994) Nature Genetics 6:148-153. 

Retroviral vectors are well known in the art and we contemplate that any retroviral gene therapy vector 
5 is employable in the invention, including B, C and D type retroviruses, xenotropic retroviruses (for 
example, NZB-X1,NZB-X2 andNZB9-l (see O'Neill (1985)7. Virol 53:160) polytropic retroviruses 
eg. MCF and MCF-MLV (see Kelly (1983) 1 Virol 45:291), spumaviruses and lentiviruses. See RNA 
Tumor Viruses, Second Edition, Cold Spring Harbor Laboratory, 1985. 

Portions of the retroviral gene therapy vector may be derived from different retroviruses. For 
10 example, retrovector LTRs may be derived from a Murine Sarcoma Virus, a tRNA binding site 
from a Rous Sarcoma Virus, a packaging signal from a Murine Leukemia Virus, and an origin of 
second strand synthesis from an Avian Leukosis Virus. 

These recombinant retroviral vectors may be used to generate transduction competent retroviral 
vector particles by introducing them into appropriate packaging cell lines (see US patent 
15 5,591,624). Retrovirus vectors can be constructed for site-specific integration into host cell DNA 
by incorporation of a chimeric integrase enzyme into the retroviral particle (see W096/37626). It 
is preferable that the recombinant viral vector is a replication defective recombinant virus. 

Packaging cell lines suitable for use with the above-described retrovirus vectors are well known 
in the art, are readily prepared (see WO95/30763 and WO92/05266), and can be used to create 
20 producer cell lines (also termed vector cell lines or "VCLs") for the production of recombinant 
vector particles. Preferably, the packaging cell lines are made from human parent cells (eg. HT1080 
cells) or mink parent cell lines, which eliminates inactivation in human serum. 

Preferred retroviruses for the construction of retroviral gene therapy vectors include Avian 
Leukosis Virus, Bovine Leukemia, Virus, Murine Leukemia Virus, Mink-Cell Focus-Inducing 
25 Virus, Murine Sarcoma Virus, Reticuloendotheliosis Virus and Rous Sarcoma Virus. Particularly 
preferred Murine Leukemia Viruses include 4070A and 1504A (Hartley and Rowe (1976) / Virol 
19:19-25), Abelson (ATCC No. VR-999), Friend (ATCC No. VR-245), Graffi, Gross (ATCC Nol 
VR-590), Kirsten, Harvey Sarcoma Virus and Rauscher (ATCC No. VR-998) and Moloney Murine 
Leukemia Virus (ATCC No. VR-190). Such retroviruses may be obtained from depositories or 
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collections such as the American Type Culture Collection ("ATCC") in Rockville, Maryland or 
isolated from known sources using commonly available techniques. 

Exemplary known retroviral gene therapy vectors employable in this invention include those 
described in patent applications GB2200651, EP0415731, EP0345242, EP0334301, WO89/02468; 

5 WO89/05349, WO89/09271, WO90/02806, WO90/07936, WO94/03622, W093/25698, 
W093/25234, WO93/11230, WO93/10218, W09 1/02805, WO91/02825, WO95/07994, US 
5,219,740, US 4,405,712, US 4,861,719, US 4,980,289, US 4,777,127, US 5,591,624. See also Vile 
(1993) Cancer Res 53:3860-3864; Vile (1993) Cancer Res 53:962-967; Ram (1993) Cancer Res 
53 (1993) 83-88; Takamiya (1992) J Neurosci Res 33:493-503; Baba (1993) J Neurosurg 

10 79:729-735; Mann (1983) Cell 33:153; Cane (1984) Proc Natl Acad Sci 81:6349; and Miller (1990) 
Human Gene Therapy 1. 

Human adenoviral gene therapy vectors are also known in the art and employable in this invention. 
See, for example, Berkner (1988) Bioteckniques 6:616 and Rosenfeld (1991) Science 252:431, and 
WO93/07283, WO93/06223, and WO93/07282. Exemplary know adenoviral gene therapy vectors 

15 employable in this invention include those described in the above referenced documents and in 
W094/12649, WO93/03769, W093/19191, W094/28938, W095/11984, WO95/00655, 
WO95/27071, W095/29993, W095/34671, WO96/05320, WO94/08026, WO94/11506, 
WO93/06223, W094/24299, WO95/14102, W095/24297, WO95/02697, W094/28152, 
W094/24299, WO95/09241, WO95/25807, WO95/05835, W094/18922 and WO95/09654. 

20 Alternatively, administration of DNA linked to killed adenovirus as described in Curiel (1992) 
Hum. Gene Ther. 3:147-154 may be employed. The gene delivery vehicles of the invention also 
include adenovirus associated virus (AAV) vectors. Leading and preferred examples of such 
vectors for use in this invention are the AAV-2 based vectors disclosed in Srivastava, 
WO93/09239. Most preferred AAV vectors comprise the two AAV inverted terminal repeats in 

25 which the native D-sequences are modified by substitution of nucleotides, such that at least 5 native 
nucleotides and up to 18 native nucleotides, preferably at least 10 native nucleotides up to 18 native 
nucleotides, most preferably 10 native nucleotides are retained and the remaining nucleotides of 
the D-sequence are deleted or replaced with non-native nucleotides. The native D-sequences of the 
AAV inverted terminal repeats are sequences of 20 consecutive nucleotides in each AAV inverted 

30 terminal repeat (ie. there is one sequence at each end) which are not involved in HP formation. The 
non-native replacement nucleotide may be any nucleotide other than the nucleotide found in the 
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native D-sequence in the same position. Other employable exemplary AAV vectors are pWP-19, 
pWN-1, both of which are disclosed in Nahreini (1993) Gene 124:257-262. Another example of 
such an AAV vector is psub201 (see Samulski (1987) J. Virol 61 :3096). Another exemplary AAV 
vector is the Double-D ITR vector. Construction of the Double-D ITR vector is disclosed in US 
Patent 5,478,745. Still other vectors are those disclosed in Carter US Patent 4,797,368 and 
Muzyczka US Patent 5,139,941, Chartejee US Patent 5,474,935, and Kotin W094/288157. Yet a 
further example of an AAV vector employable in this invention is SSV9AFABTKneo, which 
contains the AFP enhancer and albumin promoter and directs expression predominantly in the liver. 
Its structure and construction are disclosed in Su (1996) Human Gene Therapy 7:463-470. 
Additional AAV gene therapy vectors are described in US 5,354,678, US 5,173,414, US 5,139,941, 
and US 5,252,479. 

The gene therapy vectors of the invention also include herpes vectors. Leading and preferred 
examples are herpes simplex virus vectors containing a sequence encoding a thymidine kinase 
polypeptide such as those disclosed in US 5,288,641 and EP0176170 (Roizman). Additional 
exemplary herpes simplex virus vectors include HFEM/ICP6-LacZ disclosed in WO95/04139 
(Wistar Institute), .pHS Viae described in Geller (1988) Science 241:1667-1669 and in WO90/09441 
and WO92/07945, HSV Us3::pgC-lacZ described in Fink (1992) Human Gene Therapy 3:11-19 
and HSV 7134, 2 RH 105 and GAL4 described in EP 0453242 (Breakefield), and those deposited 
with the ATCC as accession numbers ATCC VR-977 and ATCC VR-260. 

Also contemplated are alpha virus gene therapy vectors that can be employed in this invention. 
Preferred alpha virus vectors are Sindbis viruses vectors. Togaviruses, Semliki Forest virus (ATCC 
VR-67; ATCC VR-1247), Middleberg virus (ATCC VR-370), Ross River virus (ATCC VR-373; 
ATCC VR-1246), Venezuelan equine encephalitis virus (ATCC VR923; ATCC VR-1250; ATCC 
VR-1249; ATCC VR-532), and those described in US patents 5,091,309, 5,217,879, and 
WO92/10578. More particularly, those alpha virus vectors described in US Serial No. 08/405,627, 
filed March 15, 1995,W094/21792, WO92/10578, WO95/07994, US 5,091,309 and US 5,217,879 
are employable. Such alpha viruses may be obtained from depositories or collections such as the 
ATCC in Rockville, Maryland or isolated from known sources using commonly available 
techniques. Preferably, alphavirus vectors with reduced cytotoxicity are used (see USSN 
08/679640). 
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DNA vector systems such as eukarytic layered expression systems are also useful for expressing 
the nucleic acids of the invention. See WO95/07994 for a detailed description of eukaryotic layered 
expression systems. Preferably, the eukaryotic layered expression systems of the invention are 
derived from alphavirus vectors and most preferably from Sindbis viral vectors. 

Other viral vectors suitable for use in the present invention include those derived from poliovirus, for 
example ATCC VR-58 and those described in Evans, Nature 339 (1989) 385 and Sabin (1973) J. Biol. 
Standardization 1:115; rhinovirus, for example ATCC VR-1 1 10 and those described in Arnold (1990) 
J Cell Biochem L401; pox viruses such as canary pox virus or vaccinia virus, for example ATCC 
VR-1 1 1 and ATCC VR-2010 and those described in Fisher-Hoch (1989) Proc Natl Acad Sci 86:317; 
Flexner (1989) Ann NY Acad Sci 569:86, Flexner (1990) Vaccine 8:17; in US 4,603,112 and US 
4,769,330 and WO89/01973; SV40 virus, for example ATCC VR-305 and those described in 
Mulligan (1979) Nature 277:108 and Madzak (1992) J Gen Virol 73:1533; influenza virus, for 
example ATCC VR-797 and recombinant influenza viruses made employing reverse genetics 
techniques as described in US 5,166,057 and in Enami (1990) Proc Natl Acad Sci 87:3802-3805; 
Enami & Palese (1991) J Tirol 65:271 1-2713 and Luytjes (1989) Cell 59:1 10, (see also McMichapl 
(1983) NEJMed 309:13, and Yap (1978) Nature 273:238 and Nature (1979) 277:108); human 
immunodeficiency virus as described in EP-0386882 and in Buchschacher (1992) J. Virol 66:2731; 
measles virus, for example ATCC VR-67 and VR-1247 and those described in EP-0440219; Aura 
virus, for example ATCC VR-368; Bebaru virus, for example ATCC VR-600 and ATCC VR-1240; 
Cabassou virus, for example ATCC VR-922; Chikungunya virus, for example ATCC VR-64 and 
ATCC VR-1241; Fort Morgan Virus, for example ATCC VR-924; Getah virus, for example ATCC 
VR-369 and ATCC VR-1243; Kyzylagach virus, for example ATCC VR-927; Mayaro vims, for 
example ATCC VR-66; Mucambo virus, for example ATCC VR-580 and ATCC VR-1244; Ndumu 
vims, for example ATCC VR-371; Pixuna virus, for example ATCC VR-372 and ATCC VR-1245; 
Tonate vims, for example ATCC VR-925; Triniti virus, for example ATCC VR-469; Una virus, for 
example ATCC VR-374; Whataroa virus, for example ATCC VR-926; Y-62-33 vims, for example 
ATCC VR-375; CNyong vims, Eastern encephalitis virus, for example ATCC VR-65 and ATCC 
VR-1242; Western encephalitis vims, for example ATCC VR-70, ATCC VR-1251, ATCC VR-622 
and ATCC VR-1252; and coronavirus, for example ATCC VR-740 and those described in Hamre 
(1966) Proc Soc Exp Biol Med 121:190. 

Delivery of the compositions of this invention into cells is not limited to the above mentioned viral 
vectors. Other delivery methods and media may be employed such as, for example, nucleic acid 
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expression vectors, polycationic condensed DNA linked or unlinked to killed adenovirus alone, for 
example see US Serial No. 08/366,787, filed December 30, 1994 and Curiel (1992) Hum Gene Ther 
3:147-154 ligand linked DNA, for example see Wu (1989) J Biol Chem 264:16985-16987, 
eucaryotic cell delivery vehicles cells, for example see US Serial No.08/240,030, filed May 9, 
1994, and US Serial No. 08/404,796, deposition of photopolymerized hydrogel materials, 
hand-held gene transfer particle gun, as described in US Patent 5,149,655, ionizing radiation as 
described in US5,206,152 and in W092/1 1033, nucleic charge neutralization or fusion with cell 
membranes. Additional approaches are described in Philip (1994) Mol Cell Biol 14:241 1-2418 and 
in Woffendin (1994) Proc Natl Acad Sci 91:1581-1585. 

Particle mediated gene transfer may be employed, for example see US Serial No. 60/023,867. 
Briefly, the sequence can be inserted into conventional vectors that contain conventional control 
sequences for high level expression, and then incubated with synthetic gene transfer molecules such 
as polymeric DNA-binding cations like polylysine, protamine, and albumin, linked to cell targeting 
ligands such as asialoorosomucoid, as described in Wu & Wu (1987) J. Biol Chem. 
262:4429-4432, insulin as described in Hucked (1990) Biochem Pharmacol 40:253-263, galactose 
as described in Plank (1992) Bioconjugate Chem 3:533-539, lactose or transferrin. 

Naked DNA may also be employed. Exemplary naked DNA introduction methods are described in 
WO 90/11092 and US 5,580,859. Uptake efficiency may be improved using biodegradable latex 
beads. DNA coated latex beads are efficiently transported into cells after endocytosis initiation by the 
beads. The method may be improved further by treatment of the beads to increase hydrophobicity and 
thereby facilitate disruption of the endosome and release of the DNA into the cytoplasm. 

Liposomes that can act as gene delivery vehicles are described in US 5,422,120, W095/13796, 
W094/23697, W091/14445 and EP-524,968. As described in USSN. 60/023,867, on non-viral 
delivery, the nucleic acid sequences encoding a polypeptide can be inserted into conventional 
vectors that contain conventional control sequences for high level expression, and then be incubated 
with synthetic gene transfer molecules such as polymeric DNA-binding cations like polylysine, 
protamine, and albumin, linked to cell targeting ligands such as asialoorosomucoid, insulin, 
galactose, lactose, or transferrin. Other delivery systems include the use of liposomes to encapsulate 
DNA comprising the gene under the control of a variety of tissue-specific or ubiquitously-active 
promoters. Further non-viral delivery suitable for use includes mechanical delivery systems such 
as the approach described in Woffendin et al (1994) Proc. Natl Acad. Sci. USA 
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91(24):11581-1 1585. Moreover, the coding sequence and the product of expression of such can be 
delivered through deposition of photopolymerized hydrogel materials. Other conventional methods 
for gene delivery that can be used for delivery of the coding sequence include, for example, use of 
hand-held gene transfer particle gun, as described in US 5,149,655; use of ionizing radiation for 
5 activating transferred gene, as described in US 5,206,152 and W092/1 1033 

Exemplary liposome and polycationic gene delivery vehicles are those described in US 5,422,120 
and 4,762,915; inWO 95/13796; W094/23697; and W091/14445; in EP-0524968; and in Stryer, 
Biochemistry, pages 236-240 (1975) W.H. Freeman, San Francisco; Szoka (1980) Biochem 
BiophysActa 600:1; Bayer (1979) Biochem BiophysActa 550:464; Rivnay (1987) Meth Enzymol 
1 0 149:1 19; Wang (1987) Proc Natl Acad Sci 84:7851 ; Plant (1 989) Anal Biochem 176:420. 

A polynucleotide composition can comprises therapeutically effective amount of a gene therapy 
vehicle, as the term is defined above. For purposes of the present invention, an effective dose will 
be from about 0.01 mg/ kg to 50 mg/kg or 0.05 mg/kg to about 10 mg/kg of the DNA constructs 
in the individual to which it is administered. j 

15 Delivery Methods 

Once formulated, the polynucleotide compositions of the invention can be administered (1) directly 
to the subject; (2) delivered ex vivo, to cells derived from the subject; or (3) in vitro for expression 
of recombinant proteins. The subjects to be treated can be mammals or birds. Also, human subjects 
can be treated. 

20 Direct delivery of the compositions will generally be accomplished by injection, either 
subcutaneously, intraperitoneally, intravenously or intramuscularly or delivered to the interstitial 
space of a tissue. The compositions can also be administered into a lesion. Other modes of 
administration include oral and pulmonary administration, suppositories, and transdermal or 
transcutaneous applications {eg. see WO98/20734), needles, and gene guns or hyposprays. Dosage 

25 treatment may be a single dose schedule or a multiple dose schedule. 

Methods for the ex vivo delivery and reimplantation of transformed cells into a subject are known 
in the art and described in eg, W093/14778. Examples of cells useful in ex vivo applications 
include, for example, stem cells, particularly hematopoetic, lymph cells, macrophages, dendritic 
cells, or tumor cells. 



WO 99/24578 PCT/IB98/01665 

-39- 

Generally, delivery of nucleic acids for both ex vivo and in vitro applications can be accomplished 
by the following procedures, for example, dextran-mediated transfection, calcium phosphate 
precipitation, polybrene mediated transfection, protoplast fusion, electroporation, encapsulation of 
the polynucleotide^) in liposomes, and direct microinjection of the DNA into nuclei, all well 
5 known in the art. 

Polynucleotide and polypeptide pharmaceutical compositions 

In addition to the pharmaceutical^ acceptable carriers and salts described above, the following 
additional agents can be used with polynucleotide and/or polypeptide compositions. 

A. Polvpeptides 

10 One example are polypeptides which include, without limitation: asioloorosomucoid (ASOR); 
transferrin; asialoglycoproteins; antibodies; antibody fragments; ferritin; interleukins; interferons, 
granulocyte, macrophage colony stimulating factor (GM-CSF), granulocyte colony stimulating 
factor (G-CSF), macrophage colony stimulating factor (M-CSF), stem cell factor and 
erythropoietin. Viral antigens, such as envelope proteins, can also be used. Also, proteins from 

1 5 other invasive organisms, such as the 1 7 amino acid peptide from the circumsporozoite protein of 
Plasmodium falciparum known as RIL 

B. Hormones. Vitamins, etc. 

Other groups that can be included are, for example: hormones, steroids, androgens, estrogens, 
thyroid hormone, or vitamins, folic acid. 

20 CPolvalkvlenes. Polysaccharides, etc. 

Also, polyalkylene glycol can be included with the desired polynucleotides/polypeptides. In a 
preferred embodiment, the polyalkylene glycol is polyethlylene glycol. In addition, mono-, di-, or 
polysaccarides can be included. In a preferred embodiment of this aspect, the polysaccharide is 
dextran or DEAE-dextran. Also, chitosan and poly(lactide-co-glycolide) 

25 D.Lipids. and Liposomes 

The desired polynucleotide/polypeptide can also be encapsulated in lipids or packaged in liposomes 
prior to delivery to the subject or to cells derived therefrom. 

Lipid encapsulation is generally accomplished using liposomes which are able to stably bind or 
entrap and retain nucleic acid. The ratio of condensed polynucleotide to lipid preparation can vary 
30 but will generally be around 1 : 1 (mg DNA:micromoles lipid), or more of lipid. For a review of the 
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use of liposomes as carriers for delivery of nucleic acids, see, Hug and Sleight (1991) Biochim. 
Biophys. Acta. 1097:1-17; Straubinger (1983) Meth. Enzymol 101:512-527. 

Liposomal preparations for use in the present invention include cationic (positively charged), 
anionic (negatively charged) and neutral preparations. Cationic liposomes have been shown to 
5 mediate intracellular delivery of plasmid DNA (Feigner (1987) Proc. Natl. Acad. Sci. USA 
84:7413-7416); mRNA (Malone (1989) Proc. Natl Acad. Sci. USA 86:6077-6081); and purified 
transcription factors (Debs (1990)7. Biol. Chem. 265:10189-10192), in functional form. 

Cationic liposomes are readily available. For example, N[l-2,3-dioleyloxy)propyl]-N^f^[-triethylammoniiim 
(DOTMA) liposomes are available under the trademark Lipofectin, from GGBCO BRL, Grand 
10 Island, NY. (See, also, Feigner supra). Other commercially available liposomes include 

transfectace (DDAB/DOPE) and DOTAP/DOPE (Boerhinger). Other cationic liposomes can be 

i 

prepared from readily available materials using techniques well known in the art. See, eg. Szoka 
(1978) Proc. Natl. Acad. Sci. USA 75:4194-4198; WO90/1 1092 for a description of the synthesis 
of DOTAP (l,2-bis(oleoyloxy)-3-(trimethylainmonio)propane) liposomes. j 

15 Similarly, anionic and neutral liposomes are readily available, such as from Avanti Polar Lipids 
(Birmingham, AL), or can be easily prepared using readily available materials. Such materials include 
phosphatidyl choline, cholesterol, phosphatidyl ethanolamine, dioleoylphosphatidyl choline (DOPC), 
dioleoylphosphatidyl glycerol (DOPG), dioleoylphoshatidyl ethanolamine (DOPE), among others. 
These materials can also be mixed with the DOTMA and DOTAP starting materials in appropriate 

20 ratios. Methods for making liposomes using these materials are well known in the art. 

The liposomes can comprise multilammelar vesicles (MLVs), small unilamellar vesicles (SUVs), 
or large unilamellar vesicles (LUVs). The various liposome-nucleic acid complexes are prepared 
using methods known in the art. See eg. Straubinger (1983) Meth. Immunol. 101 :5 12-527; Szoka 
(1978) Proc. Natl. Acad. Sci. USA 75:4194-4198; Papahadjopoulos (1975) Biochim. Biophys. Acta 
25 394:483; Wilson (1979) Cell 17:77); Deamer & Bangham (1976) Biochim. Biophys. Acta 443:629; 
Ostro (1977) Biochem. Biophys. Res. Commun. 76:836; Fraley (1979) Proc. Natl. Acad. Sci. USA 
76:3348); Enoch & Strittmatter (1979) Proc. Natl. Acad. Sci. USA 76:145; Fraley (1980) /. Biol 
Chem. (1980) 255:10431; Szoka & Papahadjopoulos (1978) Proc. Natl Acad. Sci. USA 75:145; 
and Schaefer-Ridder (1982) Science 215:166. 
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EJLipoproteins 

In addition, lipoproteins can be included with the polynucleotide/polypeptide to be delivered. 
Examples of lipoproteins to be utilized include: chylomicrons, HDL, DDL, LDL, and VLDL. Mutants, 
fragments, or fusions of these proteins can also be used. Also, modifications of naturally occurring 
5 lipoproteins can be used, such as acetylated LDL. These lipoproteins can target the delivery of 
polynucleotides to cells expressing lipoprotein receptors. Preferably, if lipoproteins are including with 
the polynucleotide to be delivered, no other targeting ligand is included in the composition. 

Naturally occurring lipoproteins comprise a lipid and a protein portion. The protein portion are 
known as apoproteins. At the present, apoproteins A, B, C, D, and E have been isolated and 
10 identified. At least two of these contain several proteins, designated by Roman numerals, AI, AH, 

aiv; ci, cn, cm. 

A lipoprotein can comprise more than one apoprotein. For example, naturally occurring 
chylomicrons comprises of A, B, C, and E, over time these lipoproteins lose A and acquire C and 
E apoproteins. VLDL comprises A, B, C, and E apoproteins, LDL comprises apoprotein B; and 
1 5 HDL comprises apoproteins A, C, and E. 

The amino acid of these apoproteins are known and are described in, for example, Breslow (1985) 
Annu Rev. Biochem 54:699; Law (1986) Adv. Exp Med. Biol. 151:162; Chen (1986) J Biol Chem 
261 : 1291 8; Kane (1980) Proc Natl Acad Sci USA 77:2465; and Utermann (1984) Hum Genet 65:232. 

Lipoproteins contain a variety of lipids including, triglycerides, cholesterol (free and esters), and 
20 phophoHpids. The composition of the lipids varies in naturally occurring lipoproteins. For example, 
chylomicrons comprise mainly triglycerides. A more detailed description of the lipid content of 
naturally occurring lipoproteins can be found, for example, in Meth. Enzymol 128 (1986). The 
composition of the lipids are chosen to aid in conformation of the apoprotein for receptor binding 
activity. The composition of lipids can also be chosen to facilitate hydrophobic interaction and 
25 association with the polynucleotide binding molecule. 

Naturally occurring lipoproteins can be isolated from serum by ultracentrifugation, for instance. 
Such methods are described in Meth. Enzymol (supra); Pitas (1980) J. Biochem. 255:5454-5460 
and Mahey (1979) J Clin. Invest 64:743-750. Lipoproteins can also be produced by in vitro or 
recombinant methods by expression of the apoprotein genes in a desired host cell. See, for example, 
30 Atkinson (1986) Annu Rev Biophys Chem 15:403 and Radding (1958) Biochim Biophys Acta 30: 
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443. Lipoproteins can also be purchased from commercial suppliers, such as Biomedical 
Technologies, Inc., Stoughton, Massachusetts, USA. Further description of lipoproteins can be 
found in Zuckermann et al PCT/US97/14465. 
F.Polvcationic Agents 

5 Polycationic agents can be included, with or without lipoprotein, in a composition with the desired 
polynucleotide/polypeptide to be delivered. 

Polycationic agents, typically, exhibit a net positive charge at physiological relevant pH and are 
capable of neutralizing the electrical charge of nucleic acids to facilitate delivery to a desired 
location. These agents have both in vitro, ex vivo, and in vivo applications. Polycationic agents can 
10 be used to deliver nucleic acids to a living subject either intramuscularly, subcutaneously, etc. 

The following are examples of useful polypeptides as polycationic agents: polylysine, polyarginine, 
polyornithine, and protamine. Other examples include histones, protamines, human serum albumin, 
DNA binding proteins, non-histone chromosomal proteins, coat proteins from DNA viruses, suph 
as (XI 74, transcriptional factors also contain domains that bind DNA and therefore may be usefjil 
15 as nucleic aid condensing agents. Briefly, transcriptional factors such as C/CEBP, c-jun, c-fos, 
AP-1, AP-2, AP-3, CPF, Prot-1, Sp-1, Oct-1, Oct-2, CREP, and TFHD contain basic domains that 
bind DNA sequences. 

Organic polycationic agents include: spermine, spermidine, and purtrescine. 

The dimensions and of the physical properties of a polycationic agent can be extrapolated from the 
20 list above, to construct other polypeptide polycationic agents or to produce synthetic polycationic 
agents. 

Synthetic polycationic agents which are useful include, for example, DEAE-dextran, polybrene. 
Lipofectin™, and lipofectAMINE™ are monomers that form polycationic complexes when 
combined with polynucleotides/polypeptides. 

25 Immunodiaznostic Assays 

Neisserial antigens of the invention can be used in immunoassays to detect antibody levels (or, 
conversely, anti-Neisserial antibodies can be used to detect antigen levels). Immunoassays based 
on well defined, recombinant antigens can be developed to replace invasive diagnostics methods. 
Antibodies to Neisserial proteins within biological samples, including for example, blood or serum 
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samples, can be detected. Design of the immunoassays is subject to a great deal of variation, and 
a variety of these are known in the art. Protocols for the immunoassay may be based, for example, 
upon competition, or direct reaction, or sandwich type assays. Protocols may also, for example, use 
solid supports, or may be by immunoprecipitation. Most assays involve the use of labeled antibody 
5 or polypeptide; the labels may be, for example, fluorescent, chemiluminescent, radioactive, or dye 
molecules. Assays which amplify the signals from the probe are also known; examples of which 
are assays which utilize biotin and avidin, and enzyme-labeled and mediated immunoassays, such 
as ELISA assays. 

Kits suitable for immunodiagnosis and containing the appropriate labeled reagents are constructed 
10 by packaging the appropriate materials, including the compositions of the invention, in suitable 
containers, along with the remaining reagents and materials (for example, suitable buffers, salt 
solutions, etc.) required for the conduct of the assay, as well as suitable set of assay instructions. 

Nucleic Acid Hybridisation 

"Hybridization" refers to the association of two nucleic acid sequences to one another by hydrogen 
15 bonding. Typically, one sequence will be fixed to a solid support and the other will be fiee in solution. 
Then, the two sequences will be placed in contact with one another under conditions that favor 
hydrogen bonding. Factors that affect this bonding include: the type and volume of solvent; reaction 
temperature; time of hybridization; agitation; agents to block the non-specific attachment of the liquid 
phase sequence to the solid support (Denhardt's reagent or BLOTTO); concentration of the sequences; 
20 use of compounds to increase the rate of association of sequences (dextran sulfate or polyethylene . 
glycol); and the stringency of the washing conditions following hybridization. See Sambrook et al 
[supra] Volume 2, chapter 9, pages 9.47 to 9.57. 

"Stringency" refers to conditions in a hybridization reaction that favor association of very similar 
sequences over sequences that differ. For example, the combination of temperature and salt 
25 concentration should be chosen that is approximately 120 to 200°C below the calculated Tm of the 
hybrid under study. The temperature and salt conditions can often be determined empirically in 
preliminary experiments in which samples of genomic DNA immobilized on filters are hybridized 
to the sequence of interest and then washed under conditions of different stringencies. See 
Sambrook et al at page 9.50. 

30 Variables to consider when performing, for example, a Southern blot are (1) the complexity of the 
DNA being blotted and (2) the homology between the probe and the sequences being detected. The 
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total amount of the fragment(s) to be studied can vary a magnitude of 10, from 0.1 to ljig for a 
plastrrid or phage digest to 10* 9 to 10' 8 g for a single copy gene in a highly complex eukaryotic 
genome. For lower complexity polynucleotides, substantially shorter blotting, hybridization, and 
exposure times, a smaller amount of starting polynucleotides, and lower specific activity of probes 
can be used. For example, a single-copy yeast gene can be detected with an exposure time of only 
1 hour starting with 1 jig of yeast DNA, blotting for two hours, and hybridizing for 4-8 hours with 
a probe of 10 8 cpm/jig. For a single-copy mammalian gene a conservative approach would start 
with 10 jag of DNA, blot overnight, and hybridize overnight in the presence of 10% dextran sulfate 
using a probe of greater than 10 8 cpm/jig, resulting in an exposure time of -24 hours. 

Several factors can affect the melting temperature (Tm) of a DNA-DNA hybrid between the probe 
and the fragment of interest, and consequently, the appropriate conditions for hybridization and 
washing. In many cases the probe is not 100% homologous to the fragment. Other commonly 
encountered variables include the length and total G+C content of the hybridizing sequences and 
the ionic strength and formamide content of the hybridization buffer. The effects of all of th^se 
factors can be approximated by a single equation: | 

Tm= 81 + 16.6(log l0 Ci) + 0.4[%(G + C)]-0.6(%formamide) - 600/w-1.5(%mismatch). 

where Ci is the salt concentration (monovalent ions) and n is the length of the hybrid in base pairs 
(slightly modified from Meinkoth & Wahl (1984) Anal Biochem. 138: 267-284). 

In designing a hybridization experiment, some factors affecting nucleic acid hybridization can be 
conveniently altered. The temperature of the hybridization and washes and the salt concentration 
during the washes are the simplest to adjust. As the temperature of the hybridization increases (ie. 
stringency), it becomes less likely for hybridization to occur between strands that are 
nonhomologous, and as a result, background decreases. If the radiolabeled probe is not completely 
homologous with the immobilized fragment (as is frequently the case in gene family and 
interspecies hybridization experiments), the hybridization temperature must be reduced, and 
background will increase. The temperature of the washes affects the intensity of the hybridizing 
band and the degree of background in a similar manner. The stringency of the washes is also 
increased with decreasing salt concentrations. 

In general, convenient hybridization temperatures in the presence of 50% foimamide are 42°C for 
a probe with is 95% to 100% homologous to the target fragment, 37°C for 90% to 95% homology, 
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and 32°C for 85% to 90% homology. For lower homologies, formamide content should be lowered 
and temperature adjusted accordingly, using the equation above. If the homology between the probe 
and the target fragment are not known, the simplest approach is to start with both hybridization and 
wash conditions which are nonstringent. If non-specific bands or high background are observed 
5 after autoradiography, the filter can be washed at high stringency and reexposed. If the time 
required for exposure makes this approach impractical, several hybridization and/or washing 
stringencies should be tested in parallel. 

Nucleic Acid Probe Assays 

Methods such as PCR, branched DNA probe assays, or blotting techniques utilizing nucleic acid 
1 0 probes according to the invention can determine the presence of cDNA or mRNA. A probe is said 
to "hybridize" with a sequence of the invention if it can form a duplex or double stranded complex, 
which is stable enough to be detected. 

The nucleic acid probes will hybridize to the Neisserial nucleotide sequences of the invention 
(including both sense and antisense strands). Though many different nucleotide sequences will 
15 encode the amino acid sequence, the native Neisserial sequence is preferred because it is the actual 
sequence present in cells. mRNA represents a coding sequence and so a probe should be 
complementary to the coding sequence; single-stranded cDNA is complementary to mRNA, and 
so a cDNA probe should be complementary to the non-coding sequence. 

The probe sequence need not be identical to the Neisserial sequence (or its complement) — some 
20 variation in the sequence and length can lead to increased assay sensitivity if the nucleic acid probe 
can form a duplex with target nucleotides, which can be detected. Also, the nucleic acid probe can 
include additional nucleotides to stabilize the formed duplex. Additional Neisserial sequence may 
also be helpful as a label to detect the formed duplex. For example, a non-complementary 
nucleotide sequence may be attached to the 5' end of the probe, with the remainder of the probe 
25 sequence being complementary to a Neisserial sequence. Alternatively, non-complementary bases 
or longer sequences can be interspersed into the probe, provided that the probe sequence has 
sufficient complementarity with the a Neisserial sequence in order to hybridize therewith and 
thereby form a duplex which can be detected. 

The exact length and sequence of the probe will depend on the hybridization conditions, such as 
30 temperature, salt condition and the like. For example, for diagnostic applications, depending on the 
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complexity of the analyte sequence, the nucleic acid probe typically contains at least 10-20 
nucleotides, preferably 15-25, and more preferably at least 30 nucleotides, although it may be 
shorter than this. Short primers generally require cooler temperatures to form sufficiently stable 
hybrid complexes with the template. 

5 Probes may be produced by synthetic procedures, such as the triester method of Matteucci et al 
[J. Am. Chem. Soc. (1981) 103:3185], or according to Urdea et al [Proa Natl. Acad. Sci. USA 
(1983) 80: 7461], or using commercially available automated oligonucleotide synthesizers. 

The chemical nature of the probe can be selected according to preference. For certain applications, 
DNA or RNA are appropriate. For other applications, modifications may be incorporated eg. 
10 backbone modifications, such as phosphorothioates or methylphosphonates, can be used to increase 
in vivo half-life, alter RNA affinity, increase nuclease resistance etc. [eg. see Agrawal & Iyer 
(1995) Curr Opin Biotechnol 6:12-19; Agrawal (1996) TIBTECH 14:376-387]; analogues such as 
peptide nucleic acids may also be used [eg. see Corey (1997) TIBTECH 15:224-229; Buchardt 
al (1993) TIBTECH 1 1 :384-386]. j 

15 Alternatively, the polymerase chain reaction (PCR) is another well-known means for detecting 
small amounts of target nucleic acids. The assay is described in: Mullis et al. [Meth. Enzymol 
(1987) 155: 335-350]; US patents 4,683,195 and 4,683,202. Two "primer" nucleotides hybridize 
with the target nucleic acids and are used to prime the reaction. The primers can comprise sequence 
that does not hybridize to the sequence of the amplification target (or its complement) to aid with 

20 duplex stability or, for example, to incorporate a convenient restriction site. Typically, such 
sequence will flank the desired Neisserial sequence. 

A thermostable polymerase creates copies of target nucleic acids from the primers using the 
original target nucleic acids as a template. After a threshold amount of target nucleic acids are 
generated by the polymerase, they can be detected by more traditional methods, such as Southern 
25 blots. When using the Southern blot method, the labelled probe will hybridize to the Neisserial 
sequence (or its complement). 

Also, mRNA or cDNA can be detected by traditional blotting techniques described in Sambrook 
et al [supra], mRNA, or cDNA generated from mRNA using a polymerase enzyme, can be purified 
and separated using gel electrophoresis. The nucleic acids on the gel are then blotted onto a solid 
30 support, such as nitrocellulose. The solid support is exposed to a labelled probe and then washed 



WO 99/24578 PCT/IB98/01665 

-47- 

to remove any unhybridized probe. Next, the duplexes containing the labeled probe are detected. 
Typically, the probe is labelled with a radioactive moiety. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figures 1-20 show biochemical data obtained in the Examples, and also sequence analysis, for 
5 ORFs 37, 5, 2, 15, 22, 28, 32, 4, 61, 76, 89, 97, 106, 138, 23, 25, 27, 79, 85 and 132. Ml and M2 
are molecular weight markers. Arrows indicate the position of the main recombinant product or, 
in Western blots, the position of the main N.meningitidis immunoreactive band. TP indicates 
N.meningitidis total protein extract; OMV indicates N.meningitidis outer membrane vesicle 
preparation. In bactericidal assay results: a diamond (♦) shows preimmune data; a triangle (A) 
10 shows GST control data; a circle (•) shows data with recombinant N.meningitidis protein. 
Computer analyses show a hydrophilicity plot (upper), an antigenic index plot (middle), and an 
AMPHI analysis (lower). The AMPHI program has been used to predict T-cell epitopes [Gao et 
al. (1989) J. Immunol 143:3007; Roberts et al. (1996) AIDS Res Hum Retrovir 12:593; Quakyi et 
al. (1992) Scand J Immunol suppl.l 1 :9) and is available in the Protean package of DNASTAR, Inc. 
15 (1228 South Park Street, Madison, Wisconsin 53715 USA). 

EXAMPLES 

The examples describe nucleic acid sequences which have been identified in N.meningitidis , along 
with their putative translation products, and also those oiN. gonorrhoeae. Not all of the nucleic acid 
sequences are complete ie. they encode less than the full-length wild-type protein. 

20 The examples are generally in the following format: 

• a nucleotide sequence which has been identified in N.meningitidis (strain B) 

• the putative translation product of this sequence 

• a computer analysis of the translation product based on database comparisons 

• corresponding gene and protein sequences identified in N.meningitidis (strain A) and in 
25 N.gonorrhoeae 

• a description of the characteristics of the proteins which indicates that they might be 
suitably antigenic 

• results of biochemical analysis (expression, purification, ELISA, FACS etc.) 
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The examples typically include details of sequence identity between species and strains. Proteins 
that are similar in sequence are generally similar in both structure and function, and the sequence 
identity often indicates a common evolutionary origin. Comparison with sequences of proteins of 
known function is widely used as a guide for the assignment of putative protein function to a new 
5 sequence and has proved particularly useful in whole-genome analyses. 

Sequence comparisons were performed at NCBI (http://www.ncbi.nlm.nih.gov) using the 
algorithms BLAST, BLAST2, BLASTn, BLASTp, tBLASTn, BLASTx, & tBLASTx [eg. see also 
Altschul et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database 
search programs. Nucleic Acids Research 25:2289-3402]. Searches were performed against the 
10 following databases: non-redundant GehBank+EMBL+DDBJ+PDB sequences and non-redundant 
GenBank CDS translations+PDB+SwissProt+SPupdate+PIR sequences. 

To compare Meningococcal and Gonococcal sequences, the tBLASTx algorithm was used, as 
implemented at http://www.genome.ou.edu/gono_blast.html. The FASTA algorithm was also used 
to compare the ORFs (from GCG Wisconsin Package, version 9.0). j 

1 5 Dots within nucleotide sequences {eg. position 495 in SEQ ID 1 1) represent nucleotides which have 
been arbitrarily introduced in order to maintain a reading frame. In the same way, double- 
underlined nucleotides were removed. Lower case letters {eg. position 496 in SEQ ID 1 1) represent 
ambiguities which arose during alignment of independent sequencing reactions (some of the 
nucleotide sequences in the examples are derived from combining the results of two or more 

20 experiments). 

Nucleotide sequences were scanned in all six reading frames to predict the presence of hydrophobic 
domains using an algorithm based on the statistical studies of Esposti et al [Critical evaluation of 
the hydropathy of membrane proteins (1990) Eur J Biochem 190:207-219]. These domains 
represent potential transmembrane regions or hydrophobic leader sequences. 

25 Open reading frames were predicted from fragmented nucleotide sequences using the program 
ORFFINDER (NCBI). 

Underlined amino acid sequences indicate possible transmembrane domains or leader sequences 
in the ORFs, as predicted by the PSORT algorithm (http://www.psort.nibb.ac.jp). Functional 
domains were also predicted using the MOTIFS program (GCG Wisconsin & PROSITE). 
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Various tests can be used to assess the in vivo immunogencity of the proteins identified in the 
examples. For example, the proteins can be expressed recombinantly and used to screen patient sera 
by immunoblot. A positive reaction between the protein and patient serum indicates that the patient 
has previously mounted an immune response to the protein in question ie. the protein is an 
5 immunogen. This method can also be used to identify immunodominant proteins. 

The recombinant protein can also be conveniently used to prepare antibodies eg. in a mouse. These 
can be used for direct confirmation that a protein is located on the cell-surface. Labelled antibody 
(eg. fluorescent labelling for FACS) can be incubated with intact bacteria and the presence of label 
on the bacterial surface confirms the location of the protein. 

10 In particular, the following methods (A) to (S) were used to express, purify and biochemically 
characterise the proteins of the invention: 

A) Chromosomal DNA preparation 

N. meningitidis strain 2996 was grown to exponential phase in 100ml of GC medium, harvested by 
centrifugation, and resuspended in 5ml buffer (20% Sucrose, 50mM Tris-HCl, 50mM EDTA, pH8). 

15 After 10 minutes incubation on ice, the bacteria were lysed by adding 10ml lysis solution (50mM 
NaCl, 1% Na-Sarkosyl, 50|ig/ml Proteinase K), and the suspension was incubated at 37°C for 2 
hours. Two phenol extractions (equilibrated to pH 8) and one ChCl 3 /isoamylalcohol (24:1) 
extraction were performed. DNA was precipitated by addition of 0.3M sodium acetate and 2 
volumes ethanol, and was collected by centrifugation. The pellet was washed once with 70% 

20 ethanol and redissolved in 4ml buffer (lOmM Tris-HCl, ImM EDTA, pH 8). The DNA 
concentration was measured by reading the OD at 260 nm. 

B) Oligonucleotide design 

Synthetic oligonucleotide primers were designed on the basis of the coding sequence of each ORF, 
using (a) the meningococcus B sequence when available, or (b) the gonococcus/meningococcus A 
25 sequence, adapted to the codon preference usage of meningococcus as necessary. Any predicted 
signal peptides were omitted, by deducing the 5'-end amplification primer sequence immediately 
downstream from the predicted leader sequence. 

For most ORFs, the 5 5 primers included two restriction enzyme recognition sites (BamHl-Ndel, 
BamHl-Nhel, or EcoBl-Nhel, depending on the gene's own restriction pattern); die 3' primers included 
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a Xhol restriction site. This procedure was established in order to direct the cloning of each 
amplification product (corresponding to each ORF) into two different expression systems: pGEX-KG 
(using either Bamffl-Xhol or EcoRl-XhoT), and pET2 lb+ (using either Ndel-Xhol or Nhel-Xhol). 

5'-end primer tail: CGC GGATCCCATATG (BamHl-Ndel) 

5 CGC GGATCCGCTAGC (BamM-Nhel) 

CCG GAATTC T AGCTAGC (EcoRl-Nhel) 

3'-end primer tail: CCCG CTCGAG (Xhol) 

For ORFs 5, 15, 17, 19, 20, 22, 27, 28, 65 & 89, two different amplifications were performed to 
clone each ORF in the two expression systems. Two different 5' primers were used for each ORF; 
10 the same 3 * Xhol primer was used as before: 

5 '-end primer tail: GGAATTC CATATG GCCATGG (NdeT) 
5 '-end primer tail: CG GGATCC (BamHl) 

I 

ORF 76 was cloned in the pTRC expression vector and expressed as an amino-terminus His-tig 
fusion. In this particular case, the predicted signal peptide was included in the final product. Nhel- 
1 5 BamHl restriction sites were incorporated using primers: 

5'-end primer tail: GATC AGCTAGC CATATG (Nhel) 

3 '-end primer tail: C GGGATCC (BamHl) 

As well as containing the restriction enzyme recognition sequences, the primers included 
nucleotides which hybridizeed to the sequence to be amplified. The number of hybridizing 
20 nucleotides depended on the melting temperature of the whole primer, and was determined for each 
primer using the formulae: 

T m = 4 (CHC)+ 2 (A-KT) (tail excluded) 

T m = 64.9 + 0.41 (% GC) - 600/N (whole primer) 

The average melting temperature of the selected oligos were 65-70°C for the whole oligo and 
25 50-55°C for the hybridising region alone. 

Table I (page 487) shows the forward and reverse primers used for each amplification. In certain 
cases, it will be noted that the sequence of the primer does not exactly match the sequence in the 
ORF. When initial amplifications were performed, the complete 5' and/or 3' sequence was not 
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known for some meningococcal ORFs, although the corresponding sequences had been identified 
in gonococcus. For amplification, the gonococcal sequences could thus be used as the basis for 
primer design, altered to take account of codon preference. In particular, the following codons were 
changed: ATA-»ATT; TCG->TCT; CAG->CAA; AAG-»AAA; GAG-+GAA; CGA->CGC; 
5 CGG->CGC; GGG-»GGC. Itahcised nucleotides in Table I indicate such a change. It will be 
appreciated that, once the complete sequence has been identified, this approach is generally no 
longer necessary. 

Oligos were synthesized by a Perkin Elmer 394 DNA7RNA Synthesizer, eluted from the columns 
in 2ml NH 4 OH, and deprotected by 5 hours incubation at 56°C. The oligos were precipitated by 
10 addition of 0.3M Na- Acetate and 2 volumes ethanol. The samples were then centrifuged and the 
pellets resuspended in either 100|il or 1ml of water. OD 260 was determined using a Perkin Elmer 
Lambda Bio spectophotometer and the concentration was determined and adjusted to 2-lOpmol/fil. 

C) Amplification 

The standard PCR protocol was as follows: 50-200ng of genomic DNA were used as a template 
15 in the presence of 20-40nM of each oligo, 400-800nM dNTPs solution, lx PCR buffer (including 
L5mM MgClJ, 2.5 units TaqI DNA polymerase (using Perkin-Elmer AmpliTaQ, GIBCO 
Platinum, Pwo DNA polymerase, or Tahara Shuzo Taq polymerase). 

In some cases, PCR was optimsed by the addition of lO^il DMSO or 50pil 2M betaine. 

After a hot start (adding the polymerase during a preliminary 3 minute incubation of the whole mix 
20 at 95°C), each sample underwent a double-step amplification: the first 5 cycles were performed 
using as the hybridization temperature the one of the oligos excluding the restriction enzymes tail, 
followed by 30 cycles performed according to the hybridization temperature of the whole length 
oligos. The cycles were followed by a final 10 minute extension step at 72°C. 

The standard cycles were as follows: 





Denatu ration 


Hybridisation 


Elongation 


First 5 cycles 


30 seconds 
95°C 


30 seconds 
50-55°C 


30-60 seconds 
72°C 


Last 30 cycles 


30 seconds 


30 seconds 


30-60 seconds 
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95°C 


65-70°C 


72°C 



The elongation time varied according to the length of the ORF to be amplified. 

The amplifications were performed using either a 9600 or a 2400 Perkin Elmer GeneAmp PCR 
System. To check the results, 1/10 of the amplification volume was loaded onto a 1-1.5% agarose 
gel and the size of each amplified fragment compared with a DNA molecular weight marker. 

5 The amplified DNA was either loaded directly on a 1% agarose gel or first precipitated with ethanol 
and resuspended in a suitable volume to be loaded on a 1% agarose gel. The DNA fragment 
corresponding to the right size band was then eluted and purified from gel, using the Qiagen Gel 
Extraction Kit, following the instructions of the manufacturer. The final volume of the DNA 
fragment was 30ul or 50ul of either water or lOmM Tris, pH 8.5. 

10 D) Digestion of PCR fragments 

The purified DNA corresponding to the amplified fragment was split into 2 aliquots and double- 
digested with: j 

- NdeVXhol or NheUXhol for cloning into pET-21b+ and further expression of the protein 
as a C-terminus His-tag fusion 

15 - 5amtf//jrao/ or £co/tf/^ 

protein as N-tenninus GST fusion. 

- For ORF 76, NheUBamM for cloning into pTRC-HisA vector and further expression 
of the protein as N-terminus His-tag fusion. 

- EcoRI/Pstl, EcoRI/Sall, Sall/PstI for cloning into pGex-His and further expression of 
20 the protein as N-terminus His-tag fusion 

Each purified DNA fragment was incubated (37°C for 3 hours to overnight) with 20 units of each 
restriction enzyme (New England Biolabs ) in a either 30 or 40ul final volume in the presence of 
the appropriate buffer. The digestion product was then purified using the QIAquick PCR 
purification kit, following the manufacturer's instructions, and eluted in a final volume of 30 or 
25 50ul of either water or lOmM Tris-HCl, pH 8.5. The final DNA concentration was determined by 
1% agarose gel electrophoresis in the presence of titrated molecular weight marker. 
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E) Digestion of the cloning vectors (pET22B, pGEX-KG, pTRC-His A, and pGex-His) 

lO^g plasmid was double-digested with 50 units of each restriction enzyme in 200jal reaction 
volume in the presence of appropriate buffer by overnight incubation at 37°C. After loading the 
whole digestion on a 1% agarose gel, the band corresponding to the digested vector was purified 
5 from the gel using the Qiagen QIAquick Gel Extraction Kit and the DNA was eluted in SO^il of 
lOmM Tris-HCl, pH 8.5. The DNA concentration was evaluated by measuring OD 260 of the sample, 
and adjusted, to 50\ig/\il of plasmid was used for each cloning procedure. 

The vector pGEX-His is a modified pGEX-2T vector carrying a region encoding six histidine 
residues upstream to the thrombin cleavage site and containing the multiple cloning site of the 
1 0 vector pTRC99 (Pharmacia). 

F) Cloning 

The fragments corresponding to each ORF, previously digested and purified, were ligated in both pET22b 
and pGEX-KG. In a final volume of 20^1, a molar ratio of 3 : 1 fragment/vector was ligated using 0.5^1 
of NEB T4 DNA ligase (400 units/^1), in the presence of the buffer supplied by the manufacturer. 
15 The reaction was incubated at room temperature for 3 hours. In some experiments, ligation was 
performed using the Boheringer "Rapid Ligation Kit", following the manufacturer's instructions. 

In order to introduce the recombinant plasmid in a suitable strain, 100^x1 E. coli DH5 competent 
cells were incubated with the ligase reaction solution for 40 minutes on ice, then at 37°C for 3 
minutes, then, after adding 800|il LB broth, again at 37°C for 20 minutes. The cells were then 
20 centrifuged at maximum speed in an Eppendorf microfiige and resuspended in approximately 200^1 
of the supernatant. The suspension was then plated on LB ampicillin (lOOmg/ml ). 

The screening of the recombinant clones was performed by growing 5 randomly-chosen colonies 
overnight at 37°C in either 2ml (pGEX or pTC clones) or 5ml (pET clones) LB broth + 100ng/ml 
ampicillin. The cells were then pelletted and the DNA extracted using the Qiagen QIAprep Spin 
25 Miniprep Kit, following the manufacturer's instructions, to a final volume of 30^1. 5jil of each 
individual miniprep (approximately lg ) were digested with either NdeVXhol or BamHVXhol and 
the whole digestion loaded onto a 1-1.5% agarose gel (depending on the expected insert size), in 
parallel with the molecular weight marker (1Kb DNA Ladder, GIBCO). The screening of the 
positive clones was made on the base of the correct insert size. 
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For the cloning of ORFs 110, 111, 113, 115, 119, 122, 125 & 130, the double-digested PCR 
product was ligated into double-digested vector using EcoKl-Psil cloning sites or, for ORFs 1 15 
& 127, EcoRl-SaR or, for ORF 122, Satl-Pstl. After cloning, the recombinant plasmids were 
introduced in the E.coli host W31 10. Individual clones were grown overnight at 37°C in L-broth 
5 with 50ul/ml ampicillin. 

G) Expression 

Each ORF cloned into the expression vector was transformed into the strain suitable for expression 
of the recombinant protein product, lul of each construct was used to transform 30ul of E.coli 
BL21 (pGEX vector), Ecoli TOP 10 (pTRC vector) or E.coli BL21-DE3 (pET vector), as described 
10 above. In the case of the pGEX-His vector, the same E.coli strain (W3110) was used for initial 
cloning and expression. Single recombinant colonies were inoculated into 2ml LB+Amp 
(lOOug/ml), incubated at 37°C overnight, then diluted 1:30 in 20ml of LB+Amp (lOOug/ml) in 
100ml flasks, making sure that the OD^ ranged between 0.1 and 0.15. The flasks were incubated 
at 30°C into gyratory water bath shakers until OD indicated exponential growth suitable fir 
15 induction of expression (0.4-0.8 OD for pET and pTRC vectors; 0.8-1 OD for pGEX and pGEX- 
His vectors). For the pET, pTRC and pGEX-His vectors, the protein expression was induced by 
addition of lmM IPTG, whereas in the case of pGEX system the final concentration offfTG was 
0.2mM. After 3 hours incubation at 30°C, the final concentration of the sample was checked by 
OD. In order to check expression, 1ml of each sample was removed, centrifuged in a microfuge, 
20 the pellet resuspended in PBS, and analysed by 12% SDS-PAGE with Coomassie Blue staining. 
The whole sample was centrifuged at 6000g and the pellet resuspended in PBS for further use. 

H) GST-fusion proteins large-scale purification. 

A single colony was grown overnight at 37°C on LB+Amp agar plate. The bacteria were inoculated 
into 20ml of LB+Amp liquid colture in a water bath shaker and grown overnight. Bacteria were 

25 diluted 1 :30 into 600ml of fresh medium and allowed to grow at the optimal temperature (20-37°C) 
to OD 5S0 0.8-1. Protein expression was induced with 0.2mM IPTG followed by three hours 
incubation. The culture was centrifuged at 8000rpm at 4°C. The supernatant was discarded and the 
bacterial pellet was resuspended in 7.5ml cold PBS. The cells were disrupted by sonication on ice 
for 30 sec at 40W using a Branson sonifier B-15, frozen and thawed twice and centrifuged again. 

30 The supernatant was collected and mixed with 1 50ul Glutatione-Sepharose 4B resin (Pharmacia) 
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(previously washed with PBS) and incubated at room temperature for 30 minutes. The sample was 
centrifuged at 700g for 5 minutes at 4°C. The resin was washed twice with 10ml cold PBS for 10 
minutes, resuspended in 1ml cold PBS, and loaded on a disposable column. The resin was washed 
twice with 2ml cold PBS until the flow-through reached OD 280 of 0.02-0.06. The GST-fusion 

5 protein was eluted by addition of 700|il cold Glutathione elution buffer (lOmM reduced 
glutathione, 50mM Tris-HCl) and fractions collected until the OD 2g0 was 0.1. 21 \d of each fraction 
were loaded on a 12% SDS gel using either Biorad SDS-PAGE Molecular weight standard broad 
range (Ml) (200, 116.25, 97.4, 66.2, 45, 31, 21.5, 14.4, 6.5 kDa) or Amersham Rainbow Marker 
(M2) (220, 66, 46, 30, 21.5, 14.3 kDa) as standards. As the MW of GST is 26kDa, this value must 

10 be added to the MW of each GST-fusion protein. 

I) His-fusion solubility analysis (ORFs 111-129) 

To analyse the solubility of the His-fusion expression products, pellets of 3ml cultures were 
resuspended in buffer Ml [500^1 PBS pH 7.2], 25jil lysozyme (lOmg/ml) was added and the 
bacteria were incubated for 15 min at 4°C. The pellets were sonicated for 30 sec at 40W using a 

15 Branson sonifier B-15, frozen and thawed twice and then separated again into pellet and 
supernatant by a centrifiigation step. The supernatant was collected and the pellet was resuspended 
in buffer M2 [8M urea, 0.5M NaCl, 20mM imidazole and 0. 1M NaH 2 P0 4 ] and incubated for 3 to 
4 hours at 4°C. After centrifiigation, the supernatant was collected and the pellet was resuspended 
in buffer M3 [6M guanidinium-HCl, 0.5M NaCl, 20mM imidazole and 0.1M NaH 2 P0 4 ] overnight 

20 at 4°C. The supernatants from all steps were analysed by SDS-PAGE. 

The proteins expressed from ORFs 1 13, 1 19 and 120 were found to be soluble in PBS, whereas 
ORFs 111, 122, 126 and 129 need urea and ORFs 125 and 127 need guanidium-HCl for their 
solubilization. 

J) His-fusion large-scale purification. 

25 A single colony was grown overnight at 37°C on a LB + Amp agar plate. The bacteria were 
inoculated into 20ml of LB+Amp liquid culture and incubated overnight in a water bath shaker. 
Bacteria were diluted 1:30 into 600ml fresh medium and allowed to grow at the optimal 
temperature (20-37°C) to OD S50 0.6-0.8. Protein expression was induced by addition of ImM IPTG 
and the culture further incubated for three hours. The culture was centrifuged at 8000rpm at 4°C, 

30 the supernatant was discarded and the bacterial pellet was resuspended in 7.5ml of either (i) cold 
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buffer A (300mM NaCl, 50mM phosphate buffer, lOmM imidazole, pH 8) for soluble proteins or 
(ii) buffer B (urea 8M, lOmM Tris-HCl, lOOmM phosphate buffer, pH 8.8) for insoluble proteins. 

The cells were disrupted by sonication on ice for 30 sec at 40W using a Branson sonifier B-15, 
frozen and thawed two times and centrifuged again. 

For insoluble proteins, the supernatant was stored at -20°C, while the pellets were resuspended in 2ml 
buffer C (6M guanidine hydrochloride, lOOmM phosphate buffer, lOmM Tris-HCl, pH 7.5) and 
treated in a homogenizer for 10 cycles. The product was centrifuged at BOOOrpm for 40 minutes. 

Supernatants were collected and mixed with 150ul NP-resin (Pharmacia) (previously washed with 
either buffer A or buffer B, as appropriate) and incubated at room temperature with gentle agitation 
for 30 minutes. The sample was centrifuged at 700g- for 5 minutes at 4°C. The resin was washed 
twice with 10ml buffer A or B for 10 minutes, resuspended in 1ml buffer A or B and loaded on a 
disposable column. The resin was washed at either (i) 4°C with 2ml cold buffer A or (ii) room 
temperature with 2ml buffer B, until the flow-through reached OD 280 of 0.02-0.06. , 

The resin was washed with either (i) 2ml cold 20mM imidazole buffer (300mM NaCl, 50tJm 
phosphate buffer, 20mM imidazole, pH 8) or (ii) buffer D (urea 8M, lOmM Tris-HCl, lOOmM 
phosphate buffer, pH 6.3) until the flow-through reached the O.D 280 of 0.02-0.06. The His-fusion 
protein was eluted by addition of 700^1 of either (i) cold elution buffer A (300mM NaCl, 50mM 
phosphate buffer, 250mM imidazole, pH 8) or (ii) elution buffer B (urea 8M, lOmM Tris-HCl, 
lOOmM phosphate buffer, pH 4.5) and fractions collected until the O.D 280 was 0.1. 21ul of each 
fraction were loaded on a 12% SDS gel. 

K) His-fusion proteins renaturation 

10% glycerol was added to the denatured proteins. The proteins were then diluted to 20ug/ml using 
dialysis buffer I (10% glycerol, 0.5M arginine, 50mM phosphate buffer, 5mM reduced glutathione, 
0.5mM oxidised glutathione, 2M urea, pH 8.8) and dialysed against the same buffer at 4°C for 12- 
14 hours. The protein was further dialysed against dialysis buffer H (10% glycerol, 0.5M arginine, 
50mM phosphate buffer, 5mM reduced glutathione, 0.5mM oxidised glutathione, pH 8.8) for 12-14 
hours at 4°C. Protein concentration was evaluated using the formula: 

Protein (mg/ml) = (1.55 x OD M0 ) - (0.76 x OD^) 
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L) His-fusion large-scale purification (ORFs 111-129) 

500ml of bacterial cultures were induced and the fusion proteins were obtained soluble in buffer 
Ml, M2 or M3 using the procedure described above. The crude extract of the bacteria was loaded 
onto a Ni-NTA superflow column (Quiagen) equilibrated with buffer Ml, M2 or M3 depending 
5 on the solubilization buffer of the fusion proteins. Unbound material was eluted by washing the 
column with the same buffer. The specific protein was eluted with the corresponding buffer 
containing 500mM imidazole and dialysed against the corresponding buffer without imidazole. 
After each run the columns were sanitized by washing with at least two column volumes of 0.5 M 
sodium hydroxide and reequilibrated before the next use. 

10 M) Mice immunisations 

20|ig of each purified protein were used to immunise mice intraperitoneally. In the case of ORFs 
2, 4, 15, 22, 27, 28, 37, 76, 89 and 97, Balb-C mice were immunised with Al(OH) 3 as adjuvant on 
days 1,21 and 42, and immune response was monitored in samples taken on day 56. For ORFs 44, 
106 and 132, CD1 mice were immunised using the same protocol. For ORFs 25 and 40, CD1 mice 
15 were immunised using Freund's adjuvant, rather than AL(OH) 3 , and the same immunisation 
protocol was used, except that the immune response was measured on day 42, rather than 56. 
Similarly, for ORFs 23, 32, 38 and 79, CD1 mice were immunised with Freund's adjuvant, but the 
immune response was measured on day 49. 

N) ELISA assay (sera analysis) 

20 The acapsulated MenB M7 strain was plated on chocolate agar plates and incubated overnight at 
37°C. Bacterial colonies were collected from the agar plates using a sterile dracon swab and 
inoculated into 7ml of Mueller-Hinton Broth (Difco) containing 0.25% Glucose. Bacterial growth 
was monitored every 30 minutes by following OD^. The bacteria were let to grow until the OD 
reached the value of 0.3-0.4. The culture was centrifuged for 10 minutes at lOOOOrpm. The 

25 supernatant was discarded and bacteria were washed once with PBS, resuspended in PBS 
containing 0.025% formaldehyde, and incubated for 2 hours at room temperature and then 
overnight at 4°C with stirring. lOOjal bacterial cells were added to each well of a 96 well Greiner 
plate and incubated overnight at 4°C The wells were then washed three times with PBT washing 
buffer (0.1% Tween-20 in PBS). 200nl of saturation buffer (2.7% Polyvinylpyrrolidone 10 in 

30 water) was added to each well and the plates incubated for 2 hours at 37°C. Wells were washed 
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three times with PBT. 200ul of diluted sera (Dilution buffer. 1% BSA, 0.1% Tween-20, 0.1% NaN 3 
in PBS) were added to each well and the plates incubated for 90 minutes at 37°C. Wells were 
washed three times with PBT. lOOul of HRP-conjugated rabbit anti-mouse (Dako) serum diluted 
1 :2000 in dilution buffer were added to each well and the plates were incubated for 90 minutes at 

5 37°C. Wells were washed three times with PBT buffer. 100^1 of substrate buffer for HRP (25ml 
of citrate buffer pH5, lOmg of O-phemldiamine and 10ul of H 2 0) were added to each well and the 
plates were left at room temperature for 20 minutes, lOOul H 2 S0 4 was added to each well and OD 490 
was followed. The ELISA was considered positive when OD 490 was 2.5 times the respective 
pre-immune sera. 

1 0 O) FACScan bacteria Binding Assay procedure. 

The acapsulated MenB M7 strain was plated on chocolate agar plates and incubated overnight at 
37°C. Bacterial colonies were coUected from the agar plates using a sterile dracon swab and 
inoculated into 4 tubes containing 8ml each Mueller-Hinton Broth (Difco) containing 0.25% 
glucose. Bacterial growth was monitored every 30 minutes by following OD 620 . The bacteria weje 
15 let to grow until the OD reached the value of 0.35-0.5. The culture was centrifuged for 10 minutes 
at 4000rpm. The supernatant was discarded and the pellet was resuspended in blocking buffer (1% 
BSA 0.4% NaN 3 ) and centrifuged for 5 minutes at 4000rpm. Cells were resuspended in blocking 
buffer to reach OD^ of 0.07. lOOul bacterial cells were added to each well of a Costar 96 well 
plate. lOOul of diluted (1:200) sera (in blocking buffer) were added to each well and plates 
20 incubated for 2 hours at 4°C. Cells were centrifuged for 5 minutes at 4000rpm, the supernatant 
aspirated and cells washed by addition of 200ul/well of blocking buffer in each well. lOOul of R- 
Phicoerytrin conjugated F(ab) 2 goat anti-mouse, diluted 1:100, was added to each well and plates 
incubated for 1 hour at 4°C. Cells were spun down by centrifugation at 4000rpm for 5 minutes and 
washed by addition of 200ul/well of blocking buffer. The supernatant was aspirated and cells 
25 resuspended in 200ul/well of PBS, 0.25% formaldehyde. Samples were transferred to FACScan 
tubes and read. The condition for FACScan setting were: FL1 on, FL2 and FL3 off; FSC-H 
threshold:92; FSC PMT Voltage: E 02; SSC PMT: 474; Amp. Gains 7.1; FL-2 PMT: 539; 
compensation values: 0. 
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P) OMV preparations 

Bacteria were grown overnight on 5 GC plates, harvested with a loop and resuspended in 10 ml 20mM 
Tris-HCl. Heat inactivation was performed at 56°C for 30 minutes and the bacteria disrupted by 
sonication for 10 minutes on ice (50% duty cycle, 50% output). Unbroken cells were removed by 

5 centrifugation at 5000g for 10 minutes and the total cell envelope fraction recovered by centrifugation 
at 50000g at 4°C for 75 minutes. To extract cytoplasmic membrane proteins from the crude outer 
membranes, the whole fraction was resuspended in 2% sarkosyl (Sigma) and incubated at room 
temperature for 20 minutes. The suspension was centrifuged at lOOOOg for 10 minutes to remove 
aggregates, and the supernatant further ultracentrifuged at 50000g for 75 minutes to pellet the outer 

10 membranes. The outer membranes were resuspended in lOmM Tris-HCI, pH8 and the protein 
concentration measured by the Bio-Rad Protein assay, using BSA as a standard. 

Q) Whole Extracts preparation 

Bacteria were grown overnight on a GC plate, harvested with a loop and resuspended in 1ml of 
20mM Tris-HCl. Heat inactivation was performed at 56°C for 30 minutes. 

15 R) Western blotting 

Purified proteins (500ng/lane), outer membrane vesicles (5^g) and total cell extracts (25|ig) derived 
from MenB strain 2996 were loaded on 15% SDS-PAGE and transferred to a nitrocellulose 
membrane. The transfer was performed for 2 hours at 1 50mA at 4°C, in transferring buffer (0.3 % 
Tris base, 1.44 % glycine, 20% methanol). The membrane was saturated by overnight incubation 

20 at 4°C in saturation buffer (10% skimmed milk, 0. 1% Triton XI 00 in PBS). The membrane was 
washed twice with washing buffer (3% skimmed milk, 0.1% Triton X100 in PBS) and incubated 
for 2 hours at 37°C with mice sera diluted 1 :200 in washing buffer. The membrane was washed 
twice and incubated for 90 minutes with a 1 :2000 dilution of horseradish peroxidase labelled anti- 
mouse Ig. The membrane was washed twice with 0.1% Triton XI 00 in PBS and developed with 

25 the Opti-4CN Substrate Kit (Bio-Rad). The reaction was stopped by adding water. 

S) Bactericidal assay 

MC58 strain was grown overnight at 37°C on chocolate agar plates. 5-7 colonies were collected and 
used to inoculate 7ml Mueller-Hinton broth. The suspension was incubated at 37°C on a nutator 
and let to grow until OD^ was 0.5-0.8. The culture was aliquoted into sterile 1.5ml Eppendorf 
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tubes and centrifuged for 20 minutes at maximum speed in a microfuge. The pellet was washed 
once in Gey's buffer (Gibco) and resuspended in the same buffer to an OD 620 of 0.5, diluted 
1 :20000 in Gey's buffer and stored at 25°C. 

50ul of Gey's buffer/1% BSA was added to each well of a 96-well tissue culture plate. 25ul of 
diluted mice sera (1:100 in Gey's buffer/0.2% BSA) were added to each well and the plate 
incubated at 4°C. 25ul of the previously described bacterial suspension were added to each well. 
25*1 of either heat-inactivated (56°C waterbath for 30 minutes) or normal baby rabbit complement 
were added to each well. Immediately after the addition of the baby rabbit complement, 22ul of 
each sample/well were plated on Mueller-Hinton agar plates (time 0). The 96-well plate was 
incubated for 1 hour at 37°C with rotation and then 22ul of each sample/well were plated on 
Mueller-Hinton agar plates (time 1). After overnight incubation the colonies corresponding to time 
0 and time 1 hour were counted. 

Table II (page 493) gives a summary of the cloning, expression and prurification results. 

Example 1 ^ 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 1>: 

1 ATGAAACAGA CAGTCAA.AT GCTTGCCGCC GCCCTGATTG CCTTGGGCTT 

51 GAACCGACCG GTGTGGNCGG ATGACGTATC GGATTTTCGG GAAAACTTGC 

ill A^GGCAGC ACAGGGAAAT GCAGCAGCCC AATACAATTT GGGCGCAATG 

III T A f TACAAA GGACGCGCGT GCGCCGGGAT GATGCTGAAG CGGTCAGATG 

III gKtcSJg CCGGCGGARC aggggttagc ccaagcccaa tacaatttgg 

III gctggatgta tgccaacggg cgcgc.gtgc gccaagatga taccgaagcg 

IS ctStcS atcggcaggc ggcagcgcag ggggttgtcc aagcccaata 

III caSSgggc gtgatatatg ccgaaggacg tggagtgcgc caagacgatg 

III TCGAAGCGGT CAGATGGTTT CGGCAGGCGG CAGCGCAGGG GGTAGCCCAA 
ill GCCcSS ATTTGGGCGT GATGTATGCC GAAAGANCGC GCGTGCGCCA 
501 AGACCG. . . 

This corresponds to the amino acid sequence <SEQ ID 2; ORF37>: 

1 MKOTVXMLAA ALIALGLNRP VWXDDVSDFR ENLXAAAQGN AAAQYNLGAM 
.} DAEAVRWTfRQ PAEQGLAQAQ YNLGWMYANG RXVRQDDTEA 

lS »S GVVQAQYNLG VIYAEGRGVR QDDVEAVRWF RQAAAQGVAQ 

151 AQNNLGVMYA ERXRVRQD. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 3>: 

1 ATGAAACAGA CAGTCAAATG GCTTGCCGCC GCCCTGATTG CCTTGGGCTT 

51 GAACCGAGCG GTGTGGGCGG ATGACGTATC GGATTTTCGG GAAAACTTGC 

ill aSSgSgC ACAGGGAAAT GCAGCAGCCC AATACAATTT GGGCGCAATG 

III SaCAAAG GACGCGGCGT GCGCCGGGAT GATGCTGAAG CGGTCAGATG 

201 GTATCGGCAG GCGGCGGAAC AGGGGTTAGC CCAAGCCCAA TACAATTTGG 

251 GCTGGATGTA TGCCAACGGG CGCGGCGTGC GCCAAGATGA TACCGAAGCG 

III GTCaStGGT ATCGGCAGGC GGCAGCGCAG GGGGTTGTCC AAGCCCAATA 

III rlATTTGGGC GTGATATATG CCGAAGGACG TGGAGTGCGC CAAGACGATG 

} III SgS CAGATGGTTT CGGCAGGCGG CAGCGCAGGG GGTAGCCCAA 

til GCCCAARACA ATTTGGGCGT GATGTATGCC GAAAGACGCG GCGTGCGCCA 

III AGAcSc- CTTGCACAAG AATGGTTTGG CAAGGCTTGT CAAAACGGAG 

551 ACCAAGACGG CTGCGACAAT GACCAACGCC TGAAGGCGGG TTATTGA 
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This corresponds to the amino acid sequence <SEQ ID 4; ORF37-l>: 

1 MKQTVKWLAA ALIALGLNRA VWA DDVSDFR ENLQAAAQGN AAAQYNLGAM 

51 YYKGRGVRRD DAEAVRWYRQ AAEQGLAQAQ YNLGWMYANG RGVRQDDTEA 

101 VRWYRQAAAQ GWQAQYNLG VIYAEGRGVR QDDVEAVRWF RQAAAQGVAQ 

151 AQNNLGVMYA ERRGVRQDRA LAQEWFGKAC QNGDQDGCDN DQRLKAGY* 

Further work identified the corresponding gene in strain A of N. meningitidis <SEQ ID 5>: 

1 ATGAAACAGA CAGTCAAATG GCTTGCCGCC GCCCTGATTG CCTTGGGCTT 

51 GAACCAAGCG GTGTGGGCGG ATGACGTATC GGATTTTCGG GAAAACTTGC 

101 AGGCGGCAGC ACAGGGAAAT GCAGCAGCCC AAAACAATTT GGGCGTGATG 

151 TATGCCGAAA GACGCGGCGT GCGCCAAGAC CGCGCCCTTG CACAAGAATG 

201 GCTTGGCAAG GCTTGTCAAA ACGGATACCA AGACAGCTGC GACAATGACC 

251 AACGCCTGAA AGCGGGTTAT TGA 

This encodes a protein having amino acid sequence <SEQ ID 6; ORF37a>: 

1 MKQTVKWLAA ALIALGLNQA VWA DDVSDFR ENLQAAAQGN AAAQNNLGVM 
51 YAERRGVRQD RALAQEWLGK ACQNGYQDSC DNDQRLKAGY * 

The originally-identified partial strain B sequence (ORF37) shows 68.0% identity over a 75aa 
overlap with ORF37a: 

10 20 30 40 50 60 

orf 37 . pep MKQTVXMLAAALI ALGLNRPVWX DDVS DFRENLXAAAQGNAAAQYNLGAMYXQRTRVRRD 
IIMI 11111111111:11 I 1 i I I I 1 i E I IMMMItl 111:11 :| 11:1 
orf 37a MKQTVKWLAAALIALGLNQAVWA DDVSDFRENLQAAAQGNAAAQNNLGVMYAERRGVRQD 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 37 . pep DAEAVRWYRQPAEQGLAQAQYNLGWMYANGRXVRQDDTEAVRWYRQAAAQGWQAQYNLG 

| J . j > ; . | 

orf 37a RALAQEWLGKACQNGYQDSCDNDQRLKAGYX 

70 80 90 

Further work identified the corresponding gene in N. gonorrhoeae <SEQ ID 7 >: 

1 ATGAAACAGA CAGTCAAATG GCTTGCCGCC GCCCTGATTG CCTTGGGCTT 

51 GAACCAAGCG GTGTGGGCGG GTGACGTATC GGATTTTCGG GAAAACTTGC 

101 AGgcggcaGA ACaggGAAAT GCAGCAGCCC AATTCAATTT GGGCGTGATG 

151 TATGAAAATG GACAAGGAGT TCGTCAAGAT TATGTACAGG CAGTGCAGTG 

201 GTATCGCAAG GCTTCAGAAC AAGGGGATGC CCAAGCCCAA TACAATTTGG 

251 GCTTGATGTA TTACGATGGA CGCGGCGTGC GCCAAGACCT TGCGCTCGCT 

301 CAACAATGGC TTGGCAAGGC TTGTCAAAAC GGAGACCAAA ACAGCTGCGA 

351 CAATGACCAA CGCCTGAAGG CGGGTTATTA A 

This encodes a protein having amino acid sequence <SEQ ID 8; ORF37ng>: 

1 MKQTVKWLAA ALIALGLNQA VWA GDVSDFR ENLQAAEQGN AAAQFNLGVM 
51 YENGQGVRQD YVQAVQWYRK ASEQGDAQAQ YNLGLMYYDG RGVRQDLALA 
101 QQWLGKACQN GDQNSCDNDQ RLKAGY* 

The originally-identified partial strain B sequence (ORF37) shows 64.9% identity over a 1 1 laa 
overlap with ORF37ng: 

or f 37 . pep MKQTVXMLAAALIALGLNRPVWXDDVSDFRENLXAAAQGNAAAQYNLGAMYXQRTRVRRD 60 

Mi!) litllllllti: II I I 1 I I I I I i II 1111111:111:11 : I I : I 
orf37ng MKQTVKWLAAALIALGLNQAVWAGDVSDFRENLQAAEQGNAAAQFNLGVMYENGQGVRQD 60 

orf 37. pep DAEAVRWYRQ PAEQG LAQ AQ YN LGWMYANGRX VRQ D DT EAVRWYRQAAAQGWQAQYNLG 120 

: : I I : I I I : : I I I I I I I I I I I I I : I I I I I I : I : I : I : I 
orf37ng YVQAVQWYBKASEQGDAQAQYNLGLMYYDGRGVRQDIJUiAQQWLGKACQNGDQNSCDNDQ 120 

or f 37 . pep VIYAEGRGVRQDDVEAVRWFRQAAAQGVAQAQNNLGVMYAERXRVRQD 1 68 



orf37ng 



RLKAGY 



126 
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The complete strain B sequence (ORF37-1) and ORF37ng show 51.5% identity in 198 aa overlap: 

10 20 30 40 50 60 

MKQTVKWLAAAL I ALGLNRAVWADDVS DFRENLQAAAQGNAAAQYNLGAMYYKGRGVRRD 
i 1 I I I I I I k 1 I 1 I I I t I I r | | f 1 lllllllllllt lltl!ll:l)l:ll :|:lll:l 
Laaalialglnqavwagdvsdfrf" " - — - ~ — — — - 

10 20 30 



orf 37-1. pep 



I i | | | 1 | | | | | | | | I I I I : II II 111111111 III I l I i I I i • i ii • I i • i • i i i • i 
Arn7nfl MKQTVKWIJUUUjIALGLNQAWAGDVSDFRENLQAAEQGNAAAQFNLGVMYENGQGVRQD 
ortJ y " in 70 30 40 50 60 

orf 37-1. pep 



70 80 90 100 110 120 

DAEAVRWYRQAAEQGLAQAQYNLGWMYANGRGVRQDDTEAVRWYRQAAAQGWQAQYNLG 

::||:|||:|:ill IIMIMI II : I I I I I U 
orf37ng YVQAVQWYRKASEQGDAQAQYNLGLMYYDGRGVRQD 

130 140 150 160 170 180 

«r.fn-1 oec viYAEGRGVRQDDVEAVRWFRQAAAQGVAQAQNNLGVMYAERRGVRQDRALAQEWFGKAC 
orrj I'tr** i I I 1 : 1 : I I I I 

LALAQQWLGKAC 

orf37ng 1Q0 

i 190 199 

orf 37-1. pep QNG DQDGCDN DQRLKAGYX 
I M I I:: 11 I I I M I I 1 1 I 
orf37ng QNGDQNSCDNDQRLKAGYX 
110 120 

25 Computer analysis of these amino acid sequences indicates a putative leader sequence, and it was 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

1 

ORF37-1 (UkDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
30 1 A shows the results of affinity purification of the GST-fusion protein, and Figure IB shows the 
results of expression of the His-fusion in E.coli. Purified GST-fusion protein was used to immunise 
mice, whose sera were used for ELISA (positive result), FACS analysis (Figure 1C), and a 
bactericidal assay (Figure ID). These experiments confirm that ORF37-1 is a surface-exposed 
protein, and that it is a useful immunogen. 
35 Figure IE shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF37-1. 

Example 2 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 9>: 

TTCGGCGA CATCGGCGGT TTGAAGGTCA ATGCCCCCGT CAAATCCGCA 

GGCGTATTGG TCGGGCGCGT CGGCGCTATC GGACTTGACC CGAAATCCTA 
40 TCAGGCGAGG GTGCGCCTCG ATTTGGACGG CAAGTATCAG TTCAGCAGCG 

ACGTTTCCGC GCAAATCCTG ACTTCsGGAC TTTTGGGCGA GCAGTACATC 

GGGCTGCAGC AGGGCGGCGA CACGGAAAAC CTTGCTGCCG GCGACACCAT 

CTCCGTAACC AGTTCTGCAA TGGTTCTGGA AAACCTTATC GGCAAATTCA 

TGACGAGTTT TGCCGAGAAA AATGCCGACG GCGGCAATGC GGAAAAAGCC 
45 GCCGAATAA 

This corresponds to the amino acid sequence <SEQ ID 10>: 

1 FGDIGGLKVN APVKSAGVLV GRVGAIGLDP KSYQARVRLD LDGKYQFSSD 
51 VSAQILTSGL LGEQYIGLQQ GGDTENLAAG DTISVTSSAM VLENLIGKFM 
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101 TSFAEKNADG GNAEKAAE* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with a hypothetical H.influenzae protein (vbrd.haein: accession number p45029) 
SEQ ID 9 and ybrd.haein show 48.4% aa identity in 122 aa overlap: 

20 30 40 50 60 70 

LGIGALVFLGLRVANVQGFAETKSrrVTATFDNIGGLKVRAPLKIGGWIGRVSAITLDE 

I : : I I I I I I : I I : I : I I : : I I I : I I : I I 
FGDIGGLKVNAPVKSAGVLVGRVGAIGLDP 
10 20 30 

80 90 100 110 120 130 

KSYLPKVSIAINQEYNEIPENSSLSIKTSGLLGEQYIALTMGFDDGDTAMLKNGSQIQDT 
111 ::|::::: :| ::::: I 11111111111:1 I III: I : I : I I 

KSYQARVRLDLDGKY-QFSSDVSAQILTSGLLGEQYIGLQQG GDTENLAAGDTISVT 

40 50 60 70 80 

140 150 160 

TSAMVLEDLIGQFL — YGSKKSDGNEKSESTEQ 
: II II I 1:1 11:1: :::|::||:: ::::|: 
SSAMVLENLIGKFMTSFAEKNADGGNAEKAAEX 
90 100 110 120 

Homology with a predicted ORF from A Gonorrhoeae 

SEQ ID 9 shows 99.2% identity over a 1 1 8aa overlap with a predicted ORF from N. gonorrhoeae: 

20 30 40 50 60 70 

GAAAVAFLAFRVAGGAAFGGSDKTYAVYADFGDIGGLKVNAPVKSAGVLVGRVGAIGLDP 

I I II I I I I I I I I I I I I I II I I I I I I I I I II 
FGDIGGLKVNAPVKSAGVLVGRVGAIGLDP 
10 20 30 

80 90 100 110 120 130 

KSYQARVRLDLDGKYQFSSDVSAQILTSGLLGEQYIGLQQGGDTENLAAGDTISVTSSAM 
I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I t I I I I 
KSYQARVRLDLDGKYQFSSDVSAQILTSGLLGEQYIGLQQGGDTENLAAGDTISVTSSAM 
40 50 60 70 80 90 

140 150 160 

VLENLIGKFMTSFAEKNAEGGNAEKAAEX 
I I I I I I I I I I I I I I I I I I : I I I I I I I I I I 
VLENLIGKFMT S FAEKNADGGNAEKAAEX 
100 110 120 

The complete yrbd H.influenzae sequence has a leader sequence and it is expected that the full- 
length homologous Kmeningitidis protein will also have one. This suggests that it is either a 
membrane protein, a secreted protein, or a surface protein and that the protein, or one of its 
epitopes, could be a useful antigen for vaccines or diagnostics. 
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Example 3 



The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 1 1>: 

1 . .ATTTTGATAT ACCTCATCCG CAAGAATCTA GGTTCGCCCG TCTTCTTCTT 

51 TCAGGAACGC CCCGGAAAGG ACGGAAAACC TTTTAAAATG GTCAAATTCC 

101 GTTCCATGCG CGACGGCTTG TATTCAGACG GCATTCCGCT GCCCGACGGA 

151 GAACGCCTGA CACCGTTCGG CAAAAAACTG CGTGCCGcCA GTwTGGACGA 

201 ACTGCCTGAA TTATGGAATA TCTTAAAAGG CGAGATGAGC CTGGTCGGCC 

251 CCCGCCCGCT GCTGATGCAA TATCTGCCGC TGTACGACAA CTTCCAAAAC 

301 CGCCGCCACG AAATGAAACC CGGCATTACC GGCTGGGCGC AGGTCAACGG 
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351 GCGCAACGCg CTTTCGTGGG ACGAAAAATT CGCCTGCGAT GTTTGGTATA 

401 TCGACCACTT CAGCCTGTGC CTCGACATCA AAATCCTACT GCTGACGGTT 

451 AAAAAAGTAT TAATCAAGGA AGGGATTTCC GCACAGGGCG AACA.aCCAT 

501 GCCCCCTTTC ACAGGAAAAC GCAAACTCGC CGTCGTCGGT GCGGGCGGAC 

551 ACGGAAAAGT CGTTGCCGAC CTTGCCGCCG CACTCGGCCG GTACAGGGAA 

601 ATCGTTTTTC TGGACGACCG CGCACAAGGC AGCGTCAACG GCTTTTCCGT 

651 CATCGGCACG ACGCTGCTGC TTGAAAACAG TTTATCGCCC GAACAATACG 

701 ACGTCGCCGT CGCCGTCGGC AACAACCGCA TCCGCCGCCA AATCGCCGAA 

751 AAAGCCGCCG CGCTCGGCTT CGCCCTGCCC GTACTGGTTC ATCCGGACGC 

801 GACCGTCTCG CCTTCTGCAA CAGTCGGACA AGGCAGCGTC GTTATGGCGA 

851 AAGCGGTCG . . 

This corresponds to the amino acid sequence <SEQ ID 12; ORF3>: 

1 ILIYLIRKNL GSPVFFFQER PGKDGKPFKM VKFRSMRDGL YSDGIPLPDG 

51 * "erltpfgkkl RAASXDELPE LWNILKGEMS LVGPRPLLMQ YLPLYDNFQN 

101 RRHEMKPGIT GWAQVNGRNA LSWDEKFACD VWYIDHFSLC LDIKILLLTV 

151 KKVLIKEGIS AQGEXTMPPF TGKRKLAWG AGGHGKWAD LAAALGRYRE 

201 IVFLDDRAQG SVNGFSVIGT TLLLENSLSP EQYDVAVAVG NNRIRRQIAE 

251 KAAALGFALP VLVHPDATVS PSATVGQGSV VMAKAV. . 

Further sequence analysis revealed the complete nucleotide sequence <SEQ ID 13>: 

1 ATGAGTAAAT TCTTCAAACG CCTGTTTGAC ATTGTTGCCT CCGCCTCGGG 
51 ACTGATTTTC CTCTCGCCAG TATTTTTGAT TTTGATATAC CTCATCCGCA 
101 AGAATCTAGG TTCGCCCGTC TTCTTCTTTC AGGAACGCCC CGGAAAGGAC 
151 GGAAAACCTT TTAAAATGGT CAAATTCCGT TCCATGCGCG ACGCGCTTGA 
201 TTCAGACGGC ATTCCGCTGC CCGACGGAGA ACGCCTGACA CCGTTCGGCA 
251 AAAAACTGCG TGCCGCCAGT TTGGACGAAC TGCCTGAATT ATGGAATATC 
301 TTAAAAGGCG AGATGAGCCT GGTCGGCCCC CGCCCGCTGC TGATGCAATA 
351 TCTGCCGCTG TACGACAACT TCCAAAACCG CCGCCACGAA ATGAAACCCG j 
401 GCATTACCGG CTGGGCGCAG GTCAACGGGC GCAACGCGCT TTCGTGGGAC 
451 GAAAAATTCG CCTGCGATGT TTGGTATATC GACCACTTCA GCCTGTGCCT 
501 CGACATCAAA ATCCTACTGC TGACGGTTAA AAAAGTATTA ATCAAGGAAG » 
551 GGATTTCCGC ACAGGGCGAA GCCACCATGC CCCCTTTCAC AGGAAAACGC 
601 AAACTCGCCG TCGTCGGTGC GGGCGGACAC GGAAAAGTCG TTGCCGACCT 
651 TGCCGCCGCA CTCGGCCGGT ACAGGGAAAT CGTTTTTCTG GACGACCGCG 
701 CACAAGGCAG CGTCAACGGC TTTTCCGTCA TCGGCACGAC GCTGCTGCTT 
751 GAAAACAGTT TATCGCCCGA ACAATACGAC GTCGCCGTCG CCGTCGGCAA 
801 CAACCGCATC CGCCGCCAAA TCGCCGAAAA AGCCGCCGCG CTCGGCTTCG 
851 CCCTGCCCGT TCTGGTTCAT CCGGACGCGA CCGTCTCGCC TTCTGCAACA 
901 GTCGGACAAG GCAGCGTCGT TATGGCGAAA GCCGTCGTAC AGGCAGGCAG 
951 CGTATTGAAA GACGGCGTGA TTGTGAACAC TGCCGCCACC GTCGATCACG 
1001 ACTGCCTGCT TAACGCTTTC GTCCACATCA GCCCAGGCGC GCACCTGTCG 
1051 GGCAACACGC ATATCGGCGA AGAAAGCTGG ATAGGCACGG GCGCGTGCAG 
1101 CCGCCAGCAG ATCCGTATCG GCAGCCGCGC AACCATTGGA GCGGGCGCAG 
1151 TCGTCGTACG CGACGTTTCA GACGGCATGA CCGTCGCGGG CAATCCGGCA 
1201 AAGCCGCTGC CGCGCAAAAA CCCCGAGACC TCGACAGCAT AA 

This conesponds to the amino acid sequence <SEQ ID 14; ORF3-l>: 

1 MSKFFKRLFD IVAS ASGLIF LSPVFLILIY LIR KNLGSPV FFFQERPGKD 

51 GKPFKMVKFR SMRDALDSDG IPLPDGERLT PFGKKLRAAS LDELPELWNI 

101 LKGEMSLVGP RPLLMQYLPL YDNFQNRRHE MKPGITGWAQ VNGRNALSWD 

151 EKFACDVWYI DHFS LCLDIK ILLLTVKKVL IK EGISAQGE ATMPPFTGKR 

201 KLAWGAGGH GKWADLAAA LGRYREIVFL DDRAQGSVNG FSVIGTTLLL 

251 ENSLSPEQYD VAVAVGNNRI RRQIAEKAAA LGFALPVLVH PDATVSPSAT 

301 VGQGSWMAK AWQAGSVLK DGVIVNTAAT VDHDCLLNAF VHISPGAHLS 

351 GNTHIGEESW IGTGACSRQQ IRIGSRATIG AGAVWRDVS DGMTVAGNPA 

401 KPLPRKNPET STA* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.meninzitidis (strain A) 

ORF3 shows 93.0% identity over a 286aa overlap with an ORF (ORF3a) from strain A of N. 
meningitidis: 

10 20 30 
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orf 3 . pep ILIYLI RKNLGSPVFFFQERP6KDGKPFKMVKFR 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 3a MSKFFtCRLFDIVASA ^GLIFLSPVFLILIYLI RKNLGSPVFFFQERPGKDGKPFKMVKFR 

10 20 30 40 50 60 

40 50 60 70 80 90 

or f 3 . pep SMRDGLYSDGIPLPDGERLTPFGKKLRAASXDELPELWNILKGEMSLVGPRPLLMQYLPL 
I I : I : I I I I I I I It I I I I I I I I I I I I I I I I I I I I I I : I I I : I I I I I II I I I I I I I I I 
orf 3a SMHDALDSDGILLPDGERLTPFGKKLRAASLDELPELWNVLKGDMSLVGPRPLLMQYLPL 

70 80 90 100 110 120 

100 110 120 130 140 150 

orf 3 . pep YDNFQNRRHEMKPGITGWAQVNGRNALSWDEKFACDVWYIDHFS LCLDIKILLLTVKKVL 
I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I : I I I I : I I I I I I I I I I I I I I I I I I I I I I I 
orf 3a YDNFQNRRHEMKPGITGWAQVNGRNALSWDERFACDIWYIDHFS LCLDIKILLLTVKKVL 
130 140 150 160 170 180 

160 170 180 190 200 210 

orf 3 . pep IKEGISAQGEXTMPPFTGKRKLAVVGAGGHGKWADLAAALGRYREIVFLDDRAQGSVNG 
I I I I I I I I II I I I II I I I I I I I II I I II I I I I I I : I I I I I i I II I || I I I : I I I I I I 
orf 3a IKEGISAQGEATMPPFTGKRKLAWGAGGHGKWAELAAALGTYGEIVFLDDRVQGSVNG 
190 200 210 220 230 240 

220 230 240 250 260 270 

or f 3 . pep FSVIGTTLLLENSLS PEQYDVAVAVGNNRIRRQIAEKAAALGFALPVLVHPDATVS PSAT 

I I I I I I I I I I II II I I I : I : I I I I I I I I I I I I I I I I I I I I I I I II I I : I II : I I I I I I I 
orf 3a FPVIGTTLLLENSLSPEQFDIAVAVGNNRIRRQIAEKAAALGFALPVLIHPDSTVSPSAT 
250 260 270 280 290 300 



280 

or f 3 . pep VGQG S WMAKAV 

1111:1111111 

orf 3a VGQGGWMAKAWQADSVLKDGVIVNTAATVDHDCLLDAFVHISPGAHLSGNTRIGEESW 
310 320 330 340 350 360 



The complete length ORF3a nucleotide sequence <SEQ ID 15> is: 



1 ATGAGTAAAT TCTTCAAACG CCTGTTTGAC ATTGTTGCCT CCGCCTCGGG 

51 ACTGATTTTC CTCTCGCCAG TATTTTTGAT TTTGATATAC CTCATCCGCA 

101 AGAATCTGGG TTCGCCCGTC TTCTTCTTTC AGGAACGCCC CGGAAAGGAC 

151 GGAAAACCTT TTAAAATGGT CAAATTCCGT TCCATGCACG ACGCGCTTGA 

201 TTCAGACGGC ATTCTGCTGC CCGACGGAGA ACGCCTGACA CCGTTCGGCA 

251 AAAAACTGCG TGCCGCCAGT TTGGACGAAC TGCCCGAACT GTGGAACGTC 

301 CTCAAAGGCG ACATGAGCCT GGTCGGCCCC CGCCCGCTGC TGATGCAATA 

351 TCTGCCGCTG TACGACAACT TCCAAAACCG CCGCCACGAA ATGAAACCGG 

401 GCATTACCGG CTGGGCGCAG GTCAACGGGC GCAACGCGCT TTCGTGGGAC 

451 GAACGCTTCG CATGCGACAT CTGGTATATC GACCACTTCA GCCTGTGCCT 

501 CGACATCAAA ATCCTACTGC TGACGGTTAA AAAAGTATTA ATCAAAGAAG 

551 GGATTTCCGC ACAGGGCGAA GCCACCATGC CCCCTTTCAC AGGAAAACGC 

601 AAACTTGCCG TCGTCGGTGC GGGCGGACAC GGCAAAGTCG TTGCCGAGCT 

651 TGCCGCCGCA CTCGGCACAT ACGGCGAAAT CGTTTTTCTG GACGACCGCG 

701 TCCAAGGCAG CGTCAACGGC TTCCCCGTCA TCGGCACGAC GCTGCTGCTT 

751 GAAAACAGTT TATCGCCCGA ACAATTCGAC ATCGCCGTCG CCGTCGGCAA 

801 CAACCGCATC CGCCGCCAAA TCGCCGAAAA AGCCGCCGCG CTCGGCTTCG 

851 CCCTGCCCGT CCTGATTCAT CCGGACTCGA CCGTCTCGCC TTCTGCAACA 

901 GTCGGACAAG GCGGCGTCGT TATGGCGAAA GCCGTCGTAC AGGCTGACAG 

951 CGTATTGAAA GACGGCGTAA TTGTGAACAC TGCCGCCACC GTCGATCACG 

1001 ATTGCCTGCT TGATGCTTTC GTCCACATCA GCCCGGGCGC GCACCTGTCG 

1051 GGCAACACGC GTATCGGCGA AGAAAGCTGG ATAGGCACAG GCGCGTGCAG 

1101 CCGCCAGCAG ATCCGTATCG GCAGCCGCGC AACCATTGGA GCGGGCGCAG 

1151 TCGTCGTGCG CGACGTTTCA GACGGCATGA CCGTCGCGGG CAACCCGGCA 

.1201 AAACCATTGG CAGGCAAAAA TACCGAGACC CTGCGGTCGT AA 

This is predicted to encode a protein having amino acid sequence <SEQ ID 16>: 



1 MSKFFKRLFD IVAS ASGLIF LSPVFLILIY LI RKNLGSPV FFFQERPGKD 
51 GKPFKMVKFR SMHDALDSDG ILLPDGERLT PFGKKLRAAS LDELPELWNV 
101" LKGDMSLVGP RPLLMQYLPL YDNFQNRRHE MKPGITGWAQ VNGRNALSWD 
151 ERFACDIWYI DHFS LCLDIK ILLLTVKKVL I KEGISAQGE ATMPPFTGKR 
201 KLAWGAGGH GKWAELAAA LGTYGEIVFL DDRVQGSVNG FPVIGTTLLL 
251 ENSLSPEQFD IAVAVGNNRI RRQIAEKAAA LGFALPVLIH PDSTVSPSAT 
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301 VGOGGWMAK AWQADSVLK DGVIVNTAAT VDHDCLLDAF VHISPGAHLS 
351 GNTRIGEESW IGTGACSRQQ IRIGSRATIG AGAWVRDVS DGMTVAGNPA 
401 KPLAGKNTET LRS* 



Two transmembrane domains are underlined. 

0RF3-1 shows 94.6% identity in 410 aa overlap with ORF3a: 



10 



40 



10 20 30 40 50 60 

MSKFFKRLFDIVASASGLIFLSPVFLILIYLIRKNLGSPVFFFQERPGKDGKPFKMVKFR 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ii M i h h 1 1 1 1 1 m 1 1 1 1 1 1 n 1 1 m MM IN 

orf3 1 MSKFF^FDIVASASGLIFLSPVFXILIYLIRKNLGSPVFFFQERPGKDGKPFKMVKFR 



orf3a.pep 



70 80 90 100 110 120 

SMHDALDSDGILLPDGERLTPFGKKLRARSLDELPELWNVLKGDMSLVGPRPLiMQYLPL 

MIIIIMIMMMIMMMIMMMMMIMMIMIIIIMI 
nrf11 SMRDALDSDGI PLPDGERLTPFGKKLRAASLDELPELWNILKGEMSLVGPRPLLMQYLPL 

13 orEJ 70 80 90 100 110 120 



or f 3 a. pep 



130 140 150 160 170 180 

orf3a YDNF QNRRHEMKPGITGWAQVNGRNALSWDERFACDIWYIDHFSLCLDIKIL^TyKKyL 
orf3a.pep ill 1 1 1 1 1 1 1 1 1 1 1 1 M II 1 1 1 M I I II II I M I I M I 1 1 1 II 1 1 1 1 I I II I II 1 1 1 1 M 

20 YDNFQNRRHEMKPGITGWAQVNGRNALSWDEKFACDVWYI DHFSLCLDIKI LLLTVKKVL 

or " X 130 140 150 160 170 , 180 

190 200 210 220 230 240 

OK or£3a oeo IREGISAQGEATMPPFTGKRKIAWGAGGHGKWAEIAAALGTYGEIVFLDDRVQGSVNG 

25 orfSa.pep | HI 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 s 1 1 1 1 1 I I '''''''' : jj. i, J,i 

nrf 3 1 I^GISAQGEATMPPFTGKRKIAWGAGGHGKWADIAAALGRYREIVFLDDRAQGSyNG 
orf3-l « go 20Q 210 220 230 240 

, A 250 260 270 280 290 300 

3 nr ^a npn F pviGTTLLI£NSLSPEQFDIAVAVGNNRIRRQIAEKAAALGFALPVLIHPDSTVSPSAT 

orf3a.pep FPVlfa | | | | | | | I I 1 : 1 1 I : II I 1 1 1 I 

^ 1 FSVIGTTLLLENSLSPEQYDVAVAVGNNRIRRQIAEKAAALGFALPVLVHPDATVSPSAT 
orlJ 250 260 270 280 290 300 

35 31Q 320 330 340 350 360 

— - — vGOGGWMAKAWQADSVLKDGVIWTAATVDHDCLLDAFVHISPGAHLSGNTRIGEESW 
I III S I I II I I II I I I I 1 I I I I 1 1 1 t I i I 1 I 1 I I I » 2 I 1 1 I I 1 1 I I I I I I 1 I 2 I I I IM 

vgmsvWawqagsvlkdgviwtaatvdhdcllnafvhispgahlsgnthigeesw 

310 320 330 340 350 360 



orf3-l 



370 380 390 400 410 

n rf1* oeo IGTG ACSRQQIRIGSRATIGAGAVWRDVSDGMTVAGNPAKPLAGKNTETLRSX 
orf3a.pep | ,,,,,, ,n 1 1 I 1 1 1 1 1 II 1 1 I I I I I M II 1 1 I I 1 1 I II II I II II 

A< ~r-F3-l IGTGACSRQQIRIGSRATIGAGAVVVRDVSDGMTVAGNPAKPLPRKNPETSTAX 

4 -> orlJ 370 380 390 400 410 

w^nl^py with hvpothe tir-fll protein en c oded hv wfc gene ( accession 7,71 928) of B. subtilis 
ORF3 and YVFC proteins show 55% aa identity in 170 aa overlap (BLASTp): 

Kt\ OT»F3 3 IYLIRKNLGSPVFFFQERPGKDGKPFKMVKFRSMRDGLYSDGIPLPDGERLTPFGKKLRA 62 

50 ° RF3 3 i ++R +GSPVFF Q RPG GKPF + KFR+M D S G LPD RLT G+ +R 

yvfc 27 1A WRLKIGSPVFFKQVRPGLHGKPFTLYKFRTMTDERDSKGNLLPDEVRLTKTGRLIRK 86 

ORF3 63 ASXDELPELWNILKGEMSLVGPRPLLMQYLPLYDNFQNRRHEMKPGITGWAQVNGRNALS 122 
0RF3 g delP+L N+IiKG++SLVGPRPLLM YLPLY Q RRHE+KPGITGWAQ+NGRNA+S 

yvfc 87 LSIDELPQLLNVLKGDLSLVGPRPLLMDYLPLYTEKQARRHEVKPGITGWAQINGRNAIS 146 

ORF3 123 WDEKFACDVWYIDHFSLCLDXXXXXXXXXXXXXXEGISAQGEXTMPPFTG 172 
W++KF DVWY+D++S LD EGI T FTG 

6 q 14 7 WEKKFELDVWYVDNWSFFLDLKILCLTVRK7LVSEGIQQTNHVTAERFTG 196 
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Homologv with a predicted ORF from N. gonorrhoeae 

ORF3 shows 86.3% identity over a 286aa overlap with a predicted ORF (ORF3.ng) from N. 
gonorrhoeae: 

orf3 . ILIYLI RKNLGSPVFFFQERPGKDGKPFKMVKFR 34 

:MI!ilil llllli::lti!llllllllllll 
orf3ng MSKAVKRLFDIIAS ASGLIVLSPVFLVLIYLI RKNKGSPVFFIRERPGKDGKPFKMVKFR 60 

orf3 SMRDGLYSIX3IPLPDGERLTPFGKKLRAASXDELPELWNILKGEMSLVGPRPLLMQYLPL 94 

1111:1 11111111:1111 1111111:1 It I I I I I I : I I 1 I I I 1 I 1 t I I 11 1 I I I I I 
orf3ng SMRDALDSDGIPLPDSERLTDFGKKLRATSLDELPELWNVLKGEMSLVGPRPLLMQYLPL 120 

orf 3 YDNFQNRRHEMKPGITGWAQVNGRNALSWDEKFACDVWYIDHFSLCLDIKILLLTVKKVL 154 

|::IMII MM III III III II Mill I II II: Ml II I: I I: I I.: I I 1 : 1 1 I I I I I 
orf 3ng YNKFQNRRHEMKPGITGWAQVNGRNALSWDEKFSCDVWYTDNFSFWLDMKILFLTVKKVL 180 

orf 3 IKEGISAQGEXTMPPFTGKRKIAWGAGGHGKVVADLAAALGRYREIVFLDDRAQGSVNG 214 

I 1 1 I 1 1 I I I 1 I M II : I : I I I M : It M I II I I I : I 1 I I I 1 I MMI MMMMM 
orf 3ng IKEGISAQGEATMPPFAGNRKLAVIGAGGHGKWAELAAALGTYGEIVFLDDRTQGSVNG 240 

orf 3 FSVIGTTLLLENSLSPEQYDVAVAVGNNRIRRQIAEKAAALGFALPVLVHPDATVSPSAT 274 

I M M I M M I I II II I : I : : M M M M I I II : I : M M I I II I Ml II MIMM 
orf3ng FPVIGTTLLLENSLSPEQFDITVAVGNNRIRRQITENAAALGFKLPVLIHPDATVSPSAI 300 

orf3 VGQGSWMAKAV 286 

: 1 1 1 1 1 I 1 I I I I 

orf3ng IGQGSVVMAKAVVQAGSVLKDGVIVNTAATVDHDCLLDAFVHISPGAHLSGNTRIGEESR 360 

The complete length ORF3ng nucleotide sequence <SEQ ID 17> is: 

1 ATGAGTAAAG CCGTCAAACG CCTGTTCGAC ATCATCGCAT CCGCATCGGG 

51 GCTGATTGTC CTGTCGCCCG TGTTTTTGGT TTTAATATAC CTCATCCGCA 

101 AAAACTTAGG TTCGCCCGTC TTCTTCattC GGGAACGCCc cgGAAAGGAc 

151 ggaaaacCTT TTAAAATGGT CAAATTCCGT TCCAtgcgcg acgcgcttGA 

201 TTCAGACGGC ATTCCGCTGC CCGATAGCGA ACGCCTGACC GATTTCGGCA 

251 AAAAATTACG CGCCACCAGT TTGGACGAAC TTCCTGAATT ATGGAATGTC 

301 CTCAAAGGCG AGATGAGCCT GGTCGGCCCC CGCCCGCTTT TGATGCAGTA 

351 TCTGCCGCTT TACAACAAAT TTCAAAACCG CCGCCACGAA ATGAAACCGG 

401 GCATTACCGG CTGGGCGCAG GTCAACGGGC GCAACGCGCT TTCGTGGGAC 

451 GAAAAGTTCT CCTGCGATGT TTGGTACACC GACAATTTCA GCTTTTGGCT 

501 GGATATGAAA ATCCTGTTTC TGACAGTCAA AAAAGTCTTG ATTAAAGAAG 

551 GCATTTCGGC GCAAGGGGAA GCCACCATGC CCCCTTTCGC GGGGAATCGC 

601 AAACTCGCCG TTATCGGCGC GGGCGGACAC GGCAAAGTCG TTGCCGAGCT 

651 TGCCGCCGCA CTCGGCACAT ACGGCGAAAT CGTTTTTCTG GACGACCGCA 

701 CCCAAGGCAG CGTCAACGGC TTCCCCGTCA TCGGCACGAC GCTGCTGCTT 

751 GAAAACAGTT TATCGCCCGA ACAATTCGAC ATCACCGTCG CCGTCGGCAA 

801 CAACCGCATC CGCCGCCAAA TCACCGAAAA CGCCGCCGCG CTCGGCTTCA 

851 AACTGCCCGT TCTGATTCAT CCCGACGCGA CCGTCTCGCC TTCTGCAATA 

901 ATCGGACAAG GCAGCGTCGT AATGGCGAAA GCCGTCGTAC AGGCCGGCAG 

951 CGTATTGAAA GACGGCGTGA TTGTGAACAC TGCCGCCACC GTCGATCACG 

1001 ACTGCCTGCT TGACGCTTTC GtccaCATCA GCCCGGGCGC GCACCTGTCG 

1051 GGCAACACGC GTATCGGCGA AGAAAGCCGG ATAGGCACGG GCGCGTGCAG 

1101 CCGCCAGCAG ACAACCGTCG GCAGCGGGGT TACCgccgGT GCAGGGgcGG 

1151 TTATCGTATG CGACATCCCG GACGGCATGA CCGTCGCGGG CAACCCGGCA 

1201 AAGCCCCTTA CGGGCAAAAA CCCCAAGACC GGGACGGCAT AA 

This encodes a protein having amino acid sequence <SEQ ID 18>: 



1 MSKAVKRLFD IIAS ASGLIV LSPVFLVLIY LI RKNLGSPV FFIRERPGKD 

51 GKPFKMVKFR SMRDALDSDG IPLPDSERLT DFGKKLRATS LDELPELWNV 

101 LKGEMSLVGP RPLLMQYLPL YNKFQNRRHE MKPGITGWAQ VNGRNALSWD 

151 EKFSCDVWYT DNFSFWLDMK ILFLTVKKVL IKEGISAQGE ATMPPFAGNR 

201 KLAVIGAGGH GKWAELAAA LGTYGEIVFL DDRTQGSVNG FPVIGTTLLL 

251 ENSLSPEQFD ITVAVGNNRI RRQITENAAA LGFKLPVLIH PDATVSPSAI 

301 IGQGSWMAK AWQAGSVLK DGVIVNTAAT VDHDCLLDAF VHISPGAHLS 

351 GNTRIGEESR IGTGACSRQQ T TVGSGVTAG AGAVIVCDI P DGMTVAGNPA 

401 KPLTGKNPKT GTA* - 
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This protein shows 86.9% identity in 413 aa overlap with ORF3-1 : 
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orf3-l.pep 
orf3ng 

orf3-l.pep 
orf3ng 

orf3-l.pep 
orf3ng 

orf3-l.pep 
orf3ng 

orf3-l.pep 
orf3ng 

orf3-l.pep 
orf3ng 

orf3-l.pep 
orf3ng 



10 20 30 40 50 60 

MSKFFKRLFDI VAS ASGLI FL S PVFLI LI YLIRKNLGS PVFFFQERPGKDGKPFKMVKFR 

|]| | | | | I I r 1 I I I t I t I I I 1 I I r I I I !illlll::IIIIIIIS!MIIIII 

MSKAVKRLFDIIASASGLIVLSPVFLVLIYLIRKNLGSPVFFIRERPGKDGKPFKMVKFR 

10 20 30 40 50 60 

70 80 90 100 110 120 

SMRDALDSDG I PL PDGERLTP FGKKLRAAS LDELPELWNI LKGEMS LVGPRPLLMQYLPL 
| | | | | | | | 1 | | | I I I : I i I I I I t I I i I : I I i I I I I S I i : I I I i M i I I I i I I I I I I I I I 
SMRDALDSDGIPLPDSERLTDFGKKLRATSLDELPELWNVLKGEMSLVGPRPLLMQYLPL 

70 80 90 100 110 120 

130 140 150 160 170 180 

YDNFQNRRHEMKPGITGWAQVNGRKALSWDEKFACDVWYIDHFSLCLDIKILLLTVKKVL 

| : : | | | | | | | | I [ | I I I t I I I I I I I I I I II I I I : I I I I I I : I I : I I : I I I : I i I I I M 
YNKFQNRRHEMKPGITGWAQVNGRNALSWDEKFSCDVWYTDNFSFWLDMKILFLTVKKVL 

130 140 150 160 170 180 

190 200 210 220 230 240 

IKEGISAQGEATMPPFTGKRKLAWGAGGHGKVVADLAAALGRYREIVFLDDRAQGSVNG 

||||t!tllllllMI:|:tilM:llllllllii:illlll 1 ! I I I ! I I I : I I i I M 
IKEGI S AQGEATMPP FAGNRKLAVI GAGGHGKWAELAAALGT YGE I VFLDDRTQGSVNG 
190 200 210 220 230 240 

250 260 270 280 290 300 

FSVIGTTLLLENSLSPEQYDVAVAVGNNRIRRQIAEKAAALGFALPVLVHPDATVSPSAT 

| ||illllt!MIIII|:i::illllllMIII:|:MIIII I I I I : I I I I I I I 1 I I 
FPVIGTTLLLENSLSPEQFDITVAVGNNRIRRQITENAAALGFKLPVLIHPDATVSPSAI 

250 260 270 280 290 300 

310 320 330 340 350 360 

VGQGS WMAKAWQAGSVLKDGVI VNTAAT VDHDCLLN AFVHI SPGAHL SGNTHI GEE SW 

: | | | | | || [ || I I I I I I 1 I I I I I I II I I ! I I I I I I I I : I I I I I t I I I I M II I : I I I I I 
IGQGSWMAKAWQAGSVLKDGVIVNTAATVDHDCLLDAFVHISPGAHLSGNTRIGEESR 

310 320 330 340 350 360 

370 380 390 400 410 

IGTGACSRQQIRIGSRATIGAGAWVRDVSDGMTVAGNPAKPLPRKNPETSTAX 

1 | | | 1 | | | I I :|| :| 11111:1 I: I I I I I I I I I I I I I llhhlM 
IGTGACSRQQTT VG SGVT AGAGAVI VC DIPDGMT VAGN PAKPLTGKN PKTGTAX 
370 380 390 400 410 
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In addition, ORF3ng shows significant homology with a hypothetical protein from B.subtilis: 

gnl|PID|e238668 (Z71928) hypothetical protein [Bacillus subtilis] 
>ai|1945702|gnl|PID|e313004 (Z94043) hypothetical protein [Bacillus subtilis] 
>gi|2635938|gnl|PID|ell86113 (Z99121) similar to capsular polysaccharide 
biosynthesis [Bacillus subtilis] Length » 202 

Score = 235 bits (594), Expect - 3e-61 

Identities - 114/195 (58%), Positives - 142/195 (72%) 

Ouerv 5 V^CRLFDIIASASGLIVLSPVFLVLIYLIRKNLGSPVFFIRERPGKDGKPFKMVKFRSMRD 64 

+KRLFD+ A+ L S + L I ++R +GSPVFF + RPG GKPF + KFR+M D 
Sbjct: 3 LKRLFDLTAAI FLLCCTS VI I LFTI AWRLKIGS PVFFKQVRPGLHGKPFTLYKFRTMT D 62 

Query 65 ALDSDGIPLPDSERLTDFGKKLRATSLDELPELWNVLKGEMSLVGPRPLLMQYLPLYNKF 124 

DS G LPD RLT G+ +R S+DELP+L NVLKG++SLVGPRPLLM YLPLY + 
Sbjct: 63 ERDSKGNLLPDEVRLTKTGRLIRKLSIDELPQLLNVLKGDLSLVGPRPLLMDYLPLYTEK 122 

Ouerv 125 QNRRREMKPGITGWAQWGRNALSWDEKFSCDVWTD^^ 184 

Q RRHE+KPGITGWAQ+NGRNA+SW++KF DVWY DN+SF+LD+KIL LTV+KVL+ EG 
Sbjct: 123 QARRHEVKPGI TGWAQINGRNAI SWEKKFELDVWYVDNWS FFLDLKI LCLT VRKVLVSEG 182 

Query: 185 I SAQGEATMPPFAGN 199 

I T F G+ 

Sbjct: 183 IQQTNHVTAERFTGS 197 
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The hypothetical product of yvfc gene shows similarity to EXOY of Rmeliloti, an 
exopolysaccharide production protein. Based on this and on the two predicted transmembrane 
regions in the homologous N. gonorrhoeae sequence, it is predicted that these proteins, or their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 4 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 19>: 

1 . . AACCATATGG CGATTGTCAT CGACGAATAC GGCGGCACAT CCGGCTTGGT 

51 CACCTTTGAA GACATCATCG AGCAAATCGT CGGCGAAATC GAAGACGAGT 

101 TTGACGAAGA CGATAGCGCC GACAATATCC ATGCCGTTTC TTCAGACACG 

151 TGGCGCATCC ATGCAGCTAC CGAAATCGAA GACATCAACA CCTTCTTCGG 

201 CACGGAATAC AGCATCGAAG AAGCCGACAC CATT.GGCGG CCTGGTCATT 

251 CAAGAGTTGG GACATCTGCC CGTGCGCGGC GAAAAAGTCC TTATCGGCGG 

301 TTTGCAGTTC ACCGTCGCAC GCGCCGACAA CCGCCGCCTG CATACGCTGA 

351 TGGCGACCCG CGTGAAGTAA GC ACCGC CGTTTCTGCA 

401 CAGTTTAG 

This corresponds to amino acid sequence <SEQ ID 20; ORF5>: 

1 . .NHMAIVIDEY GGTSGLVTFE DIIEQIVGEI EDEFDEDDSA DNIHAVSSDT 
51 WRIHAATEIE DINTFFGTEY SIEEADTIXR PGHSRVGTSA RARRKSPYRR 
101 FAVHRRTRRQ PPPAYADGDP REVS XR RFCTV* 

Further sequence analysis revealed the complete DNA sequence to be <SEQ ID 21>: 

1 ATGGACGGCG CACAACCGAA AACGAATTTT TTTGAACGCC TGATTGCCCG 

51 ACTCGCCCGC GAACCCGATT CCGCCGAAGA CGTATTAAAC CTGCTTCGGC 

101 AGGCGCACGA GCAGGAAGTT TTTGATGCGG ATACGCTTTT AAGATTGGAA 

151 AAAGTCCTCG ATTTTTCCGA TTTGGAAGTG CGCGACGCGA TGATTACGCG 

201 CAGCCGTATG AACGTTTTAA AAGAAAACGA CAGCATCGAG CGCATCACCG 

251 CCTACGTTAT CGATACCGCC CATTCGCGCT TCCCCGTCAT CGGCGAAGAC 

301 AAAGACGAAG TTTTGGGCAT TTTGCACGCC AAAGACCTGC TCAAATATAT 

351 GTTTAACCCC GAGCAGTTCC ACCTCAAATC CATTCTCCGC CCCGCCGTCT 

401 TCGTCCCCGA AGGCAAATCG CTGACCGCCC TTTTAAAAGA GTTCCGCGAA 

451 CAGCGCAACC ATATGGCGAT TGTCATCGAC GAATACGGCG GCACATCCGG 

501 CTTGGTCACC TTTGAAGACA TCATCGAGCA AATCGTCGGC GAAATCGAAG 

551 ACGAGTTTGA CGAAGACGAT AGCGCCGACA ATATCCATGC CGTTTCTTCC 

601 GAACGCTGGC GCATCCATGC AGCTACCGAA ATCGAAGACA TCAACACCTT 

651 CTTCGGCACG GAATACAGCA GCGAAGAAGC CGACACCATT CGGCCTGGTC 

701 ATTCAAGAGT TGGGACATCT GCCCGTGCGC GGCGAAAAAG TCCTTATCGG 

751 CGGTTTGCAG TTCACCGTCG CACGCGCCGA CAACCGCCGC CTGCATACGC 

801 TGATGGCGAC CCGCGTGAAG TAAGCACCGC CGTTTCTGCA CAGTTTAGGA 

851 TGACGGTACG GGCGTTTTCT GTTTCAATCC GCCCCATCCG CCAAACATAA 

This corresponds to amino acid sequence <SEQ ID 22; ORF5-l>: 

1 MDGAQPKTNF FERLIARLAR EPDSAEDVLN LLRQAHEQEV FDADTLLRLE 

51 KVLDFSDLEV RDAMITRSRM NVLKENDSIE RITAYVIDTA HSRFPVIGED 

101 KDEVLGILHA KDLLKYMFNP EQFHLKSILR PAVFVPEGKS LTALLKEFRE 

151 QRNHMAIVID EYGGTSGLVT FEDIIEQIVG EIEDEFDEDD SADNIHAVSS 

201 ERWRIHAATE IEDINTFFGT EYSSEEADTI RPGHSRVGTS ARARRKSPYR 

251 RFAVHRRTRR QPPPAYADGD PREVSTAVSA QFRMTVRAFS VSIRPIRQT* 

Further work identified the corresponding gene in strain A of N.meningitidis <SEQ ID 23 >: 

1 ATGGACGGCG CACAACCGAA AACAAATTTT TTNNAACGCC TGATTGCCCG 

51 ACTCGCCCGC GAACCCGATT CCGCCGAAGA CGTATTGACC CTGTTGCGCC 

101 AAGCGCACGA ACAGGAAGTA TTTGATGCGG ATACGCTTTT AAGATTGGAA 

151 AAAGTCCTCG ATTTTTCTGA TTTGGAAGTG CGCGACGCGA TGATTACGCG 

201 CAGCCGTATG AACGTTTTAA AAGAAAACGA CAGCATCGAA CGCATCACCG 

251 CCTACGTTAT CGATACCGCC CATTCGCGCT TCCCCGTCAT CGGTGAAGAC 

301 AAAGACGAAG TTTTGGGTAT TTTGCACGCC AAAGACCTGC TCAAATATAT 

351 GTTCAACCCC GAGCAGTTCC ACCTCAAATC GATATTGCGC CCTGCCGTCT 
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401 TCGTCCCCGA AGGCAAATCG CTGACCGCCC TTTTAAAAGA GTTCCGCGAA 

451 CAGCGCAACC ATATGGCAAT CGTCATCGAC GAATACGGCG GCACGTCGGG 

501 TTTGGTAACT TTTGAAGACA TCATCGAGCA AATCGTCGGC GACATCGAAG 

551 ATGAGTTTGA CGAAGACGAA AGCGCGGACA ACATCCACGC CGTTTCCGCC 

601 GAACGCTGGC GCATCCACGC GGCTACCGAA ATCGAAGACA TCAACGCCTT 

651 TTTCGGCACG GAATACAGCA GCGAAGAAGC CGACACCATC GGCGGCCNTG 

701 GTCATTCAGG AATTGGNACA CCTGCCCGTG CGCGGCGAAA AAGTCNTTAT 

751 CGGCGNNTTG CANTTCACNG TCGCCNGCGC NGACAACCGC CGCCTGCATA 

801 CGCTGATGGC GACCCGCGTG AAGTAAGCTC CGCCGTTTCT GTACAGTTTA 

851 GGATGACGGT ACGGGCGTTT TCTGTTTCAA TCCGCCCCAT CCGCCANACA 

901 TAA 

This encodes a protein having amino acid sequence <SEQ ID 24; ORF5a>: 

1 MDGAQPKTNF XXRLIARLAR EPDSAEDVLT LLRQAHEQEV FDADTLLRLE 
51 KVLDFSDLEV RDAMITRSRM NVLKENDSIE RITAYVIDTA HSRFPVIGED 
101 KDEVLGILHA KDLLKYMFNP EQFHLKSILR PAVFVPEGKS LTALLKEFRE 
151 QRNHMAIVID EYGGTSGLVT FEDIIEQIVG DIEDEFDEDE SADNIHAVSA 
201 ERWRIHAATE IEDINAFFGT EYSSEEADTI GGXGHSGIGT PARARRKSXY 
251 RRXAXHXRXR XQPPPAYADG DPREVSSAVS VQFRMTVRAF SVSIRPIRXT 
301 * 

The originally-identified partial strain B sequence (ORF5) shows 54.7% identity over a 124aa 
overlap with ORF5a: 

10 20 1 30 

nrfS neD NHMAIVIDEYGGTSGLVTFEDIIEQIVGEI 
F P Mill II II II I Ml I Ml I M llllll: t 

orfSa FHLKSILRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGLVTFEDIIEQIVGDI 

130 140 150 160 170 180 j 



40 50 60 70 80 90 

EDEFDEDDSADNIHAVSSDTWRIHAATEIEDINTFFGTEYSIEEADTIXRPGHSRVGTSA 

MUM |:| MM Mil:: 1111111111111:1111111 I III II III Ml I 
«rf5a EDEFDEDESADNIHAVSAERWRIHAATEIEDINAFFGTEYSSEEADTIGGXGHSGIGTPA 
190 200 210 220 230 240 



or f 5. pep 



100 110 120 130 

orf5 pep RARRKSPYRRFAVHRRTRRQPPPAYADGDPREVSXXXXXRRFCTV 

mill iii i i i:> immimmi 

orf5a RARRKSXYRRXAXHXRXRXQPPPAYADGDPREVSSAVSVQFRMTVRAFSVSIRPIRXTX 
250 260 270 280 290 300 

The complete strain B sequence (ORF5-1) and ORF5a show 92.7% identity in 300 aa overlap: 

10 20 30 40 50 60 

orfSa pep MDGAQPKTNFXXRLIARLAREPDSAEDVLTLLRQAHEQEVFDADTLLRLEKVLDFSDLEV 

p 1 1 1 1 1 1 m 1 1 iimmmmiiiiiii:iiimmiiimimmiiiiiiiiiiiii 

orf5-l MDGAQPKTNFFERLIARLAREPDSAEDVLNLLRQAHEQEVFDADTLLRLEKVLDFSDLEV 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf5a pep RDAMITRSRMNVLKENDSIERITAYVIDTAHSRFPVIGEDKDEVLGILHAKDLLKYMFNP 

MMiiMiiiMMiimimiiiimimiiimiiiimmmmm 

orf5-l RDAMITRSRMNVLKENDSIERITAYVIDTAHSRFPVIGEDKDEVLGILHAKDLLKYMFNP 
70 80 90 100 110 120 

130 140 150 160 170 180 

orf5a Pep EQFHLKSILRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGLVTFEDIIEQIVG 

iMiiMiiMiiiiiiiimimmmmmiiiiiimiimmiim 

orf5-l EQFHLKSILRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGLVTFEDIIEQIVG 
130 140 150 160 170 180 



orf5a.pep 



190 200 210 220 230 240 

DIEDEFDEDESADNIHAVSAERWRIHAATEIEDINAFFGTEYSSEEADTIGGXGHSGIGT 
: M M 1 I I 1 : 1 I I I I M t I : I I K 1 1 M I I 11 I M t : t U I I I M I I I M 1 III Ml 
orf5-l EIEDEFDEDDSADNIHAVSSERWRIHAATEIEDINTFFGTEYSSEEADTIRP-GHSRVGT 
190 200 210 220 230 

250 260 270 280 290 300 
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or f 5a. pep PARARRKSXYRRXAXHXRXRXQPPPAYADGDPREVS SAVS VQFRMTVRAFS VS I RPIRXT 
Itillll 111 I I 1:1 IIIIIIIMIMIII:IM:IIMIIIIIMllllli | 
orf5-l SARARRKS PYRRFAVHRRTRRQPPPAYADGDPREVSTAVSAQFRMTVRAFSVS IRPIRQT 

240 250 260 270 280 290 

Further work identified the a partial DNA sequence in N.gonorrhoeae <SEQ ID 25> which encodes 
a protein having amino acid sequence <SEQ ID 26; ORF5ng>: 

1 MDGAQPKTNF FERLIARLAR EPDSAEDVLN LLRQAHEQEV FDADTLTRLE 

51 KVLDFAELEV RDAMITRSRM NVLKENDSIE RITAYVIDTA HSRFPVIGED 

101 KDEVLGILHA KDLLKYMFNP EQFHLKSVLR PAVFVPEGKS LTALLKEFRE 

151 QRNHMAIVID EYGGTSGLVT FEDIIEQIVG DIEDEFDEDE SADDIHSVSA 

201 ERWRIHAATE IEDINAFFGT EYGSEEADTI RRLGHSGIGT PARARRKSPY 

251 RRFAVHRRPR RQPPPAHADG DPREVSRACP HRRFCTV* 

Further analysis revealed the complete gonococcal nucleotide sequence <SEQ ID 27> to be: 

1 ATGGACGGCG CACAACCGAA AACAAATTTT TTTGAACGCC TGATTGCCCG 

51 ACTCGCCCGC GAACCCGATT CCGCCGAAGA CGTATTAAAC CTGCTTCGGC 

101 AGGCGCACGA ACAGGAAGTT TTTGATGCCG ACACACTGAC CCGGCTGGAA 

151 AAAGTATTGG ACTTTGCCGA GCTGGAAGTG CGCGATGCGA TGATTACGCG 

201 CAGCCGCATG AACGTATTGA AAGAAAACGA CAGCATCGAA CGCATCACCG 

251 CCTACGTCAT CGATACCGCC CATTCGCGCT TCCCCGTCAT CGGCGAAGAC 

301 AAAGACGAAG TTTTGGGCAT TTTGCACGCC AAAGACCTGC TCAAATATAT 

351 GTTCAACCCC GAGCAGTTCC ACCTGAAATC CGTCTTGCGC CCTGCCGTTT 

401 TCGTGCCCGA AGGCAAATCT TTGACCGCCC TTTTAAAAGA GTTCCGCGAA 

451 CAGCGCAACC ATATGGCAAT CGTCATCGAC GAATACGGCG GCACGTCGGG 

501 TTTGGTCACC TTTGAAGACA TCATCGAGCA AATCGTCGGT GACATCGAAG 

551 ACGAGTTTGA CGAAGACGAA AGCGccgacg acatCCACTC cgTTTccgCC 

601 GAACGCTGGC GCATCCacgc ggctaCCGAA ATCGAAGaca TCAACGCCTT 

651 TTTCGGTACG GAatacggca gcgaagaagc cgacaccatc cggcggctTG 

701 GTCATTCAGG AATTGGGACA CCTGCCCGTG CGCGGCGAAA AAGTCCTTAt 

751 cggcgGTTTG Cagttcaccg tCGCCCGCGC CGACAACCGC CGCCTGCACA 

801 CGCTGATGGC GACCCGCGTG AAGTAAGCAG AGCCTGCCcg AccgccgttT 

851 CTGCacAGTT TAGGatgACG gtaCGGTCGT TTTCTGTTTC AATCCGCCCC 

901 ATCCGCCAAA CATAA 

This encodes a protein having amino acid sequence <SEQ ID 28; ORF5ng-l>: 

1 MDGAQPKTNF FERLIARLAR EPDSAEDVLN LLRQAHEQEV FDADTLTRLE 
51 KVLDFAELEV RDAMITRSRM NVLKENDSIE RITAYVIDTA HSRFPVIGED 
101 KDEVLGILHA KDLLKYMFNP EQFHLKSVLR PAVFVPEGKS LTALLKEFRE 
151 QRNHMAIVID EYGGTSGLVT FEDIIEQIVG DIEDEFDEDE SADDIHSVSA 
201 ERWRIHAATE IEDINAFFGT EYGSEEADTI RRLGHSGIGT PARARRKSPY 
251 RRFAVHRRPR RQPPPAHADG DPREVSRACP TAVSAQFRMT VRSFSVSIRP 
301 IRQT* 

The originally-identified partial strain B sequence (ORF5) shows 83.1% identity over a 135aa 
overlap with the partial gonococcal sequence (ORFSng): 

orf5 NHMAIVIDEYGGTSGLVTFEDIIEQIVGEI 30 

llllllllllllilllililllillllhl 
orfSng FHLKSVLRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGLVTFEDIIEQIVGDI 182 

orf5 EDEFDEDDSADNIHAVSSDTWRIHAATEIEDINTFFGTEYSIEEADTIXRPGHSRVGTSA 90 

MM II l:MI: I 1:1 I:: I I I I I I It II M I : M I I I I : I I I I I E I Ml Ml ! 
orf5ng EDEFDEDESADDIHSVSAERWRIHAATEIEDINAFFGTEYGSEEADTIRRLGHSGIGTPA 242 

orf5 RARRKS P YRRFAVHRRTRRQPP PAYADGDPREVSX RRFCTV 131 

I I I I I I I I I I M I II I I i 1 M M : I I I I I M M IMMI 
orf5ng RARRKS PYRRFAVHRRPRRQPPPAHADGDPREVSRACPHRRFCTV 287 

The complete strain B and gonococcal sequences (ORF5-1 & ORF5ng-l) show 92.4% identity in 
304 aa overlap: 



10 20 30 40 50 . 60 

orf 5ng-l . pep MDGAQPKTNFFERLIARLAREPDSAEDVLNLLRQAHEQEVFDADTLTRLEKVLDFAELEV 
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| I I I I I I I I I I 1 I I I I I I I I I I I l i I I I II I t i I I I I M i I M I I I 1IIIMM::III 
orf5-l MDGAQPKTNFFERLIARLAREPDSAEDVLNLLRQAHEQEVFDADTLLRLEKVLDFSDLEV 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 5na-l . pep RDAMITRSRMNVLKENDSIERITAYVIDTAHSRFPVIGEDKDEVLGILHAKDLLKYMFNP 

till Ill I II ■ Mil II I I I II I I I I I I I I II I 1 I II I I I I I I I 1 I I i I i I I I I 

orf 5-1 RDAMITRSRMNVLKENDSIERITAYVIDTAHSRFPVIGEDKDEVLGILHAKDLLKYMFNP 

70 80 90 100 110 120 



10 



130 140 150 160 170 180 

orf 5na- 1 . pep EQFHLKSVLRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGLVTFEDIIEQIVG 
|||MI|:||| IIIIIIIIMI1IIII IIIMIIIIIIMIMMM IIIIMIIIMII 
orf 5-1 EQFHLKSILRPAVFVPEGKSLTALLBCEFREQRNHMAIVIDEYGGTSGLVTFEDIIEQIVG 
J5 ^ " 130 140 150 160 170 180 

190 200 210 220 230 240 

orf5nq-l pep DIEDEFDEDESADDIHSVSAERWRIHAATEIEDINAFFGTEYGSEEADTIRRLGHSGIGT 
: | | | | I I I I : I I I : I I : I I : I I I I I I I I I I I I I I I : I II I I i : I I I I I I I I III :M 
on orf 5-1 EIEDEFDEDDSADNIHAVSSERWRIHAATEIEDINTFFGTEYSSEEADTIRP-GHSRVGT 

ZU 190 200 210 220 230 

250 260 270 280 290 300 

orf 5ng-l . pep PARARRKS PYRRFAVHRRPRRQPPPAHADGDPREVSRACPTAVSAQFRMTVRS FS VS I RP 
25 M I I I II I i I I I t t I I I I 1 I I I I I : I I I I I I I M I I I I I I I I I I M : I I I I I I I 

orf5-l S ARARRKS PYRRFAVHRRTRRQPPPAYADGDPREVS TAVSAQFRMTVRAFSVS IRP 

240 250 260 270 280 290 

30 orfSng-l.pep IRQTX 

I I I I I 

orf 5-1 IRQTX 

300 I 

Computer analysis of these amino acid sequences indicates a putative leader sequence, anil 
35 identified the following homologies: 

Homolo gy with hemolysin homoloe TlvC (accession U32716) of Kinfluenzae 
ORF5 and TlyC proteins show 58% aa identity in 77 aa overlap (BLASTp). 

ORF5 2 HMAIVIDEYGGTSGLVTFEDIIEQIVGEIEDEFDEDDSADNIHAVSSDTWRIHAATEIED 61 
HMAIV+DE+G SGLVT EDI+EQIVG+IEDEFDE++ AD I +S T+ + A T+I+D 
40 TlyC 166 HMAI WDE FGAVSGLVT I EDI LEQI VGDIE DEFDEEEI AD- IRQLSRHT YAVRALT DI DD 224 

ORF5 62 INTFFGTEYSIEEADTI 78 

N F T++ EE DTI 
TlyC 225 FNAQFNT DFDDEEVDT I 241 

45 ORF5ng-l also shows significant homology with TlyC: 

SCORES Initl: 301 Initn: 419 Opt: 668 

Smith-Waterman score: 668; 45.9% identity in 242 aa overlap 

10 20 30 40 50 

50 orf 5ng-l . pep MDGAQPKTNFFERLIARLAR-EPDSAEDVLNLLRQAHEQEVFDADTLTRLEK 

IN: |::|: : I : I :::::: I :::::::: I :| :| 
tlyc haein MNDEQQNSNQSENTKKPFFQSLFGRFFQGELKNREELVEVIRDSEQNDLIDQNTREMIEG 
" 10 20 30 40 50 60 

55 60 70 80 90 100 109 

orf 5ng-l . pep VLDFAELEVRDAMITRSRMNVLKENDSIERITAYVTDTAHSRFPVTGE — DKDEVLGILH 
| : : : | 1 | : | | I I I 11:: :::::::: : I : : I I I I I 1 1 1 : : I : I : : : I I I I 

tlyc haein VME I AELRVRD IMI PRSQI I FIE DQQDLNTCLNT I IES AHSRFPVIADADDRDN I VG ILH 

70 80 90 100 110 120 



60 



110 120 130 140 150 160 

orf 5ng-l . pep AKDLLKYMF-NPEQFHLKSVLRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGL 
Ml IN:: : 1 I 1 : 1 : 1 I I : I : I I I : I : :II:M :l I I I I 1 : I I : I : : I I I 
tlyc haein AKDLLKFLREDAEVFDLSSLLRPWIVPESKRVDRMLKDFRSERFHMAIWDEFGAVSGL 
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130 140 150 160 170 180 

170 180 190 200 210 220 

orf5ng-l .pep VTFEDIIEQIVGDIEDEFDEDESADDIHSVSAERWRIHAATEIEDINAFFGTEYGSEEAD 
M: I I I : I II II I I I I II 1 I: I I I |:::| : : ::| I: I: I: I I |:|:: : I h I 
tlyc haein VTIEDILEQIVGDIEDEFDEEEIAD-IRQLSRHTYAVRALTDIDDFNAQFNTDFDDEEVD 
- " 190 200 210 220 230 

230 240 250 260 270 280 

orf 5hg-l .pep TIRRLGHSGIG-TPARARRKSPYRRFAVHRRPRRQPPPAHADGDPREVSRACPTAVSAQF 
I I I : : I I I : 

tlyc haein TIGGLIMQTFGYLPKRGEEIILKNLQFKVTSADSRRLIQLRVTVPDEHLAEMNNVDEKSE 
240 250 260 270 280 290 

Homology with a hypothetical secreted protein from E.coli: 

ORF5a shows homology to a hypothetical secreted protein from E.coli: 

sp|P77392|YBEX_ECOLI HYPOTHETICAL 33.3 KD PROTEIN IN CUTE-ASNB INTERGENIC REGION 
>gi 1 1778577 (U82598) similar to H. Influenzae [Escherichia coli) >gi 1 1786879 
(AE000170) f292; This 292 aa ORF is 23% identical (9 gaps) to 272 residues of an 
approx. 440 aa protein YTFL_HAEIN SW: P44717 [Escherichia coli] Length - 292 

Score » 212 bits (533), Expect - 3e-54 

Identities = 112/230 (48%), Positives = 149/230 (64%), Gaps = 3/230 (1%) 

Query: 2 DGAQPKTNFXXRLIARLAR-EPDSAEDVLTLLRQAHEQEVFDADTLLRLEKVXDFSDLEV 60 

D K F L+++L EP + +++L L+R + + ++ D DT LE V+D +D V 
Sbjct: 10 DTISNKKGFFSLLLSQLFHGEPKNRDELLALIRDSGQNDLIDEDTRDMLEGVMDIADQRV 69 

Query: 61 RDAMITRSRMNVLKENDSIERITAYVIDTAHSRFPVIGEDKDEVXGILHAKDLLKYM-FN 119 

RD MI RS+M LK N +++ +I++AHSRFPVI EDKD + GIL AKDLL +M + 

Sbjct: 70 RDIMIPRSQMITLKRNQTLDECLDVIIESAHSRFPVISEDKDHIEGILMAKDLLPFMRSD 129 

Query: 120 PEQFHLKSILRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGLVTFEDIIEQIV 179 

E F + +LR AV VPE K + +LKEFR QR HMAIVIDE+GG SGLVT EDI+E IV 
Sbjct: 130 AEAFSMDKVLRQAVVVPESKRVDRMLKEFRSQRYHMAIVIDEFGGVSGLVTIEDILELIV 189 

Query: 180 GDIEDEFDEDESADNIHAVSAERWRIHAATEIEDINAFFGTEYSSEEADT 229 

G+IEDE+DE++ D +S W + A IED N FGT +S EE DT 
Sbjct: 190 GEIEDEYDEEDDID-FRQLSRHTWTVRALASIEDFNEAFGTHFSDEEVDT 238 

Based on this analysis, including the amino acid homology to the TlyC hemolysin-homologue from 
H. influenzae (hemolysins are secreted proteins), it was predicted that the proteins from 
Kmeningitidis and N.gonorrhoeae are secreted and could thus be useful antigens for vaccines or 
diagnostics. 



ORF5-1 (30.7kDa) was cloned in the pGex vector and expressed in E.coli, as described above. The 
products of protein expression and purification were analyzed by SDS-PAGE. Figure 2 A shows 
the results of affinity purification of the GST-fusion protein. Purified GST-fusion protein was used 
to immunise mice, whose sera were used for Western blot analysis (Figure IB). These experiments 
confirm that ORF5-1 is a surface-exposed protein, and that it is a useful immunogen. 



Example 5 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 29>: 

1 ATGCGCGGCG GCAGGCCGGA TTCCGTTACC GTGCAGATTA TCGAAGGTTC 
51 GCGTTTTTCG CATATGAGGA AAGTCATCGA CGCAACGCCC GACATCGGAC 
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i m ACGACACCAA AGGCTGGAGC AATGAAAAAC TGATGGCGGA AGTTGCGCCC 

III SScCTTCA GCGGCAATCC TGAAgGGCAG TTTTTCCCCG ACAGCTACGA 

HI SmStoS GGCGGCAGTG ATTTGCAGAT TTACCAAACC GCCTACAAgG 

III StGCAAC GCCGCCTGAA TGAgGGCATG GGAAAGCAGG CAGGACGGGC 

ll[ TGCCTTATAA AAACCCTTAT GAAATGCTGA TTATGGCGAr CCTGGTCGAA 

5 35i aSSaacag GGCATGAAGC CGASCsCGAC CATGTCGCTT ccgtcttcgt 

III CAACCGCCTG AAAATCGGTA TGCGCCTGCA AACCgAssCG TCCGTGATTT 

A 11 ACGGCATGGG TGCGGCATAC AAGGGCAAAA TCCGTAAAGC CGACCTGCGC 

Sol SSSc CGTACAACAC CTACACGCGC GGCGGTCTGC CGCCAACCCC 

JO 551 GATTGCGCTG CCC. . 

This corresponds to the amino acid sequence <SEQ ID 30; ORF7>: 

1 MRGGRPDSVT VQIIEGSRFS HMRKVIDATP DIGHDTKGWS NEKLMAEVAP 

A ^fsgnpegq ffpdsyeida GGSDLQIYQT aykamqrrln eawesrqdgl 

ini pSySS MAXLVEKETG HEAXXDHVAS VFVNRLKIGM RLQTXXSVIY 
15 l 5X GMGAAYKGKI RKADLRRDTP YNTYTRGGLP PTPIALP . . 

Further sequence analysis revealed the complete DNA sequence <SEQ ID 31>: 

1 ATGTTGAGAA AATTGTTGAA ATGGTCTGCC GTTTTTTTGA CCGTGTCGGC 

Si AGCCGTTTTC GCCGCGCTGC TTTTTGTTCC TAAGGATAAC GGCAGGGCAT 

inl ACCGAATCAA AATTGCCAAA AACCAGGGTA TTTCGTCGGT CGGCAGGAAA 

\ll cttcSgaag ACCGCATCGT GTTCAGCAGG CATGTTTTGA CGGCGGCGGC 

20 HI ctaIgttttg GGTGTGCACA ACAGGCTGCA TACGGGGACG TACAGATTGC 

III CTTCGGAAGT GTCTGCTTGG GATATCTTGC AGAAAATGCG CGGCGGCAGG . 

Ill CCGGATTCCG TTACCGTGCA GATTATCGAA GGTTCGCGTT TTTCGCATAT 

III GAGGAAAGTC ATCGACGCAA CGCCCGACAT CGGACACGAC ACCAAAGGCT 

_ - ll\ GGAGCAATGA AAAACTGATG GCGGAAGTTG CGCCCGATGC CTTCAGCGGC 

25 \H GGCAGTTTTT CCCCGACAGC TACGAAATCG ATGCGGGCGG 

ill CAGTGATTTG CAGATTTACC AAACCGCCTA CAAGGCGATG CAACGCCGCC 

III TGAATGAGGC ATGGGAAAGC AGGCAGGACG GGCTGCCTTA TAAAAACCCT 

III TATGAAATGC TGATTATGGC GAGCCTGGTC GAAAAGGAAA CAGGGCATGA 

- ft III AGC?GACCGC GACCATGTCG CTTCCGTCTT CGTCAACCGC CTGAAAATCG 

30 nil GTATGCGCCT GCAAACCGAC CCGTCCGTGA TTTACGGCAT GGGTGCGGCA 

7S1 TACAAGGGCA AAATCCGTAA AGCCGACCTG CGCCGCGACA CGCCGTACAA 

III CACCTACACG CGCGGCGGTC TGCCGCCAAC CCCGATTGCG CTGCCCGGCA 

III aSgcaS CGATGCCGCC GCCCATCCGT CCGGCGAAAA atacctgtat 

„ 901 ttcgtgtcca aaatggacgg cacgggcttg agccagttca gccatgattt 

35 Hi gaccgaacac aatgccgccg tccgcaaata tattttgaaa aaataa 

This corresponds to the amino acid sequence <SEQ ID 32; ORF7-l>: 

1 MLRKLLKWSA VFLTVSAAVF_AALLFVPKDN GRAYRIKIAK NQGISSVGRK 
51 SriVFSR HVLTAAAXVL G VHNRLHTGT YRLPSEVSAW DIU5KMRGGR 
An ,H SsVTvSS GSRFSHMRKV IDATPDIGHD TKGWSNEKLM AEVAPDAFSG 

40 \H hpegqfSds yeidaggsdl qiyqtaykam qrrlneawes rqdglpyknp 

III SSmmmlv eketgheadr dhvasvfvnr lkigmrlqtd psviygmgaa 

III SKSSJl rrdtpyntyt rgglpptpia lpgkaaldaa ahpsgekyly 

301 FVSKMDGTGL sqfshdlteh naavrkyilk k* 

45 Computer analysis of this amino acid sequence gave the following results: 

Uo^ninp Y with hypothet ic protein enco d ed hv vcev gene (accession P44270) of H.mfluenzae 
ORF7 and yceg proteins show 44% aa identity in 192 aa overlap: 

ORF7 1 MRGGRPDSVTVQIIEGSRFSHMRKVIDATPDIGHDTKGWSNEKLMA EVAPDAFSG 55 

, p. y+ IEG F RK ++ P + K SNh++ A tt -r 

50 yceg 102 LNSGKEVQFNVKWIEGKTFKDWRKDLENAPHLVQTLKDKSNEEIFALLDI.PDIGQNLELK 161 

0RF7 56 J"*™™ 115 

yce g 162 WEGWLYPDTYlTYTPKSTDIXLLKRSAERMKKALNKAVTOERDEDLPIiANPYEMLIIASIV 221 

55 ORF7 116 EKETGHEAXXDHVASVFVNRLK^^ 175 

trvFTP VASVF+NRLK M+LQT +VIYGMG Y G IRK DL TPYNiX 

222 Ei^TGIANERAKVASVFINRLKAKMKLQTDPTVIYGMGENYNGNIRKKDLETKTPYNTYV 281 



yceg 
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ORF7 176 RGGLPPTPIALP 187 

GLPPTPIA+P 
yceg 282 IDGLPPTPIAMP 293 



The complete length YCEG protein has sequence: 



1 MKKFLIAILL LILILAGVAS FS YYKMTEFV KTPVNVQADE LLTIERGTTS 

51 SKLATLFEQE KLIADGKLLP YLLKLKPELN KIKAGTYSLE NVKTVQDLLD. 

101 LLNSGKEVQF NVKWIEGKTF KDWRKDLENA PHLVQTLKDK SNEEIFALLD 

151 LPDIGQNLEL KNVEGWLYPD TYNYTPKSTD LELLKRSAER MKKALNKAWN 

201 ERDEDLPLAN PYEMLILASI VEKETGIANE RAKVASVFIN RLKAKMKLQT 

251 DPTVIYGMGE NYNGNIRKKD LETKTPYNTY VIDGLPPTPI AMPSESSLQA 

301 VANPEKTDFY YFVADGSGGH KFTRNLNEHN KAVQEYLRWY RSQKNAK 

Homology with a predicted ORF from N.meninzitidis (strain A) 

ORF7 shows 95.2% identity over a 187aa overlap with an ORF (ORF7a) from strain A of N. 
meningitidis: 

10 20 30 

orf 7 . pep MRGGRPDSVTVQIIEGSRFSHMRKVIDATP 

I I I I I I I I I II I I I I I I I I I I I I 

orf 7a AAYVLGVHNRLHTGTYRLPSEVSAWDILQKMRGGRPDSVTVQIIEGSRFSHMRKVIDATP 
70 80 90 100 110 120 

40 50 60 70 80 90 

or f 7 . pep DIGHDTKGWSNEKLMAEVAPDAFSGNPEGQFFPDSYEIDAGGSDLQIYQTAYKAMQRRLN 
II I I I I I I I I II I Ml 1 I II I I I I I I I I I I I I I II I I I I I I I I I : I I I I II I I I I I I I 
orf 7a DIEHDTKGWSNEKLMAEVAPDAFSGNPEGQFFPDSYEIDAGGSDLRIYQIAYKAMQRRLN 
130 140 150 160 170 180 

100 110 120 130 140 150 

or f 7 . pep EAWESRQDGLPYKNPYEMLIMAXLVEKETGHEAXXDHVASVFVNRLKIGMRLQTXXSVI Y 

I I I I I I I II I I I I I II I I I I I I 1:11111111 I I II I I I I I I I I I I I I II I I I I I 
orf 7 a EAWESRQDGLPYKNPYEMLIMASLIEKETGHEADRDHVASVFVNRLKIGMRLQTDPSVIY . 

190 200 210 220 230 240 

160 170 180 

orf 7. pep GMGAAYKGKIRKADLRRDT PYNT YTRGGLP PT P I ALP 

I I II I I I I I I I I I I I I I I I II I I I I I I I I I I I I II I I 
orf 7a GMGAAYKGKIRKADLRRDTPYNTYTRGGLPPTPIALPGECAALDAAAHPSGEKYLYFVSKM 
250 . 260 270 280 290 300 

orf7a DGTGLSQFSHDLTEHNAAVRKYILKKX 
310 320 330 

The complete length ORF7a nucleotide sequence <SEQ ID 33> is: 

1 ATGTTGAGAA AATTGTTGAA ATGGTCTGCC GTTTTTTTGA CCGTATCGGC 

51 AGCCGTTTTC GCCGCGCTGC TTTTCGTCCC TAAAGACAAC GGCAGGGCAT 

101 ACAGGATTAA AATTGCCAAA AACCAGGGTA TTTCGTCGGT CGGCAGGAAA 

151 CTTGCCGAAG ACCGCATCGT GTTCAGCAGG CATGTTTTGA CGGCGGCGGC 

201 CTACGTTTTG GGTGTGCACA ACAGGCTGCA TACGGGGACG TACAGACTGC 

251 CTTCGGAAGT GTCTGCTTGG GATATCTTGC AGAAAATGCG CGGCGGCAGG 

301 CCGGATTCCG TTACCGTGCA GATTATCGAA GGTTCGCGTT TTTCGCATAT 

351 GAGGAAAGTC ATCGACGCAA CGCCCGACAT CGAACACGAC ACCAAAGGCT 

401 GGAGCAATGA AAAACTGATG GCGGAAGTTG CCCCTGATGC CTTCAGCGGC 

451 AATCCTGAAG GGCAGTTTTT CCCCGACAGC TACGAAATCG ATGCGGGCGG 

501 CAGCGATTTA CGGATTTACC AAATCGCCTA CAAGGCGATG CAACGCCGAC 

551 TGAATGAGGC ATGGGAAAGC AGGCAGGACG GGCTGCCTTA TAAAAACCCT 

601 TATGAAATGC TGATTATGGC GAGCCTGATC GAAAAGGAAA CAGGGCATGA 

651 AGCCGACCGC GACCATGTCG CTTCCGTCTT CGTCAACCGC CTGAAAATCG 

701 GTATGCGCCT GCAAACCGAC CCGTCCGTGA TTTACGGCAT GGGTGCGGCA 

751 TACAAGGGCA AAATCCGTAA AGCCGACCTG CGCCGCGACA CGCCGTACAA 

801 CACCTACACG CGCGGCGGTC TGCCGCCAAC CCCGATCGCG CTGCCCGGCA 

851 AGGCGGCACT CGATGCCGCC GCCCATCCGT CCGGTGAAAA ATACCTGTAT 

901 TTCGTGTCCA AAATGGACGG TACGGGCTTG AGCCAGTTCA GCCATGATTT 

951 GACCGAACAC AACGCCGCCG TTCGCAAATA TATTTTGAAA AAATAA 
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This is predicted to encode a protein having amino acid sequence <SEQ ID 34>: 

1 MT.RKLL KWSA VFLTVSAAVF AA LLFVPKDN GRAYRIKIAK NQGISSVGRK 

51 LAEDRIVFSR HVLTAAAYVL GVHNRLHTGT YRLPSEVSAW DILQKMRGGR 

101 PDSVTVQUE GSRFSHMRKV IDATPDIEHD TKGWSNEKLM AEVAPDAFSG 

151 NPEGQFFPDS YEIDAGGSDL RIYQIAYKAM QRRLNEAWES RQDGLPYKNP 

201 YEMLIMASLI EKETGHEADR DHVASVFVNR LKIGMRLQTD PSVIYGMGAA 

251 YKGKIRKADL RRDTPYNTYT RGGLPPTPIA LPGKAALDAA AHPSGEKYLY 

301 FVSKMDGTGL SQFSHDLTEH NAAVRKYILK K* 

A leader peptide is underlined. 

ORF7a and ORF7-1 show 98.8% identity in 331 aa overlap: 



orf7a.pep 
orf7-l 



10 20 30 40 5b 60 

MLRKLLKWSAVFLTVSAAVFAALLFVPKDNGRAYRIKIAKNQGISSVGRKLAEDRIVFSR 
MUM Ml (Mil II Mil III Mill Hill ill I ill MINI MM MMI III II 
MLRKLLKWS AVFLTVS AAVFAALLFVPKDNGRAYRIKI AKNQGI S SVGRKLAEDRIVFSR 

10 20 30 40 50 60 



70 80 90 100 110 120 

nrf 7a Deo HVLTAAAYVLGVHNRLHTGTYRLPSEVSAWDILQKMRGGRPDSVTVQI IEGSRFSHMRKV 
orf7a.pep ^ I | I I 1 1 M I IK t I i M M I 1 I I I 1 1 1 M 1 1 1 1 1 I I I L I 1 1 I I 1 1 t M I ! I 1 1 I M I M 
HVLTAAAYVLGVHNRLHTGTYRLPSEVSAWD I LQKMRGGRPDSVTVQI IEGSRFSHMRKV 
70 80 90 100 110 120 



orf7-l 



130 140 150 160 170 180 

IDATPDIEHDTKGWSNEKLMAEVAPDAFSGNPEGQFFPDSYEIDAGGSDLRIYQIAYKAM 

I I 1 [ I | | MMMMMMMMMMMMMMMMMMMMMMM Mill 
«rf7-l IDATPDIGHDTKGWSNEKLMAEVAPDAFSGNPEGQFFPDSYEIDAGGSDLQIYQTAYKAM 
° ri 130 140 150 160 170 180 

190 200 210 220 230 240 

QRRLNEAWESRQDGLPYKNPYEMLIMASLIEKETGHEADRDHVASVFVNRLKIGMRLQTD 

1 1 1 | M 1 1 1 1 1 I M I I I I I U I M M M i : M II M I 1 1 I i M 1 1 M II M 1 1 ! I I M M 
ORRLNEAWESRQDGLPYKNPYEMLIMASLVEKETGHEADRDHVASVFVNRLKIGMRLQTD 
190 200 210 220 230 240 

250 260 270 280 290 300 

PSVIYGMGAAYKGKIRKADLRRDTPYNTYTRGGLPPTPIALPGKAALDAAAHPSGEKYLY 

MM II Mill (Mil MIIIIIMIIMMMM Mill I II I M I 1 1 II II I I I MM 
PSVIYGMGAAYKGKIRKADLRRDTPYNTYTRGGLPPTPIALPGKAALDAAAHPSGEKYLY 

250 260 270 280 290 300 



or f 7a. pep 



or f 7 a. pep 
orf7-l 

orf7a.pep 
orf7-l 



310 320 330 

orf 7a . pep FVSKMDGTGLSQFSHDLTEHNAAVRKYILKKX 
M | I M i 1 1 i M I I I M i M I I I i II M I II i 
orf 7-1 FVSKMDGTGLSQFSHDLTEHNAAVRKYILKKX 
310 320 330 



Homology with a predicted ORF from N gonorrhoeae 

ORF7 shows 94.7% identity over a 187aa overlap with a predicted ORF (ORF7.ng) from N. 
gonorrhoeae: 



MRGGRPDSVTVQI IEGSRFSHMRKVI DATPDIGHDTKGWSNEKLMAEVAPDAFSGNPEGQ 60 
IMMMMIMMiMMMMMMMMMMMMIMMMIMMMIMMM 

MRGGRPDSVTVQIIEGSRFSHMRKVIDATPDIGHDTKGWSNEKLMAEVAPDAFSGNPEGQ 60 

FFPDSYEIDAGGSDLQIYQTAYKAMQRRLNEAWESRQDGLPYKNPYEMLIMAXLVEKETG 120 

Mill MMMIMMMMMMIM :M MM III II III Ml hlMil 

FFPDSYEIDAGGSDLQIYQTAYKAMQRRLNEAWAGRQDGLPYKNPYEMLIMASLIEKETG 120 

HEAXXDHVASVFVNRLKIGMRLQTXXSVIYGMGAAYKGKIRKADLRRDTPYNTYTRGGLP 180 

Ml MM ill Mil I MM III Mill IM I MM Mill II Ml MUM MM 

HEADRDRVASVFVNRLKIGMRLQTDPSVIYGMGAAYKGKIRKADLRRDTPYNTYTGGGLP 180 



orf7 
orf7ng 
orf7 
orf7ng 
orf7 
orf7ng 

orf7 PTPIALP 187 
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II MM 

orf7ng PTRIALPGKAAMDAAAHPSGEKYLYFVSKMDGTGLSQFSHDLTEHNAAVRKYILKK 236 

An ORF7ng nucleotide sequence <SEQ ID 35> is predicted to encode a protein having amino acid 
sequence <SEQ ID 36>: 

1 MRGGRPDSVT VQIIEGSRFS HMRKVIDATP DIGHDTKGWS NEKLMAEVAP 

51 DAFSGNPEGQ FFPDSYEIDA GGSDLQIYQT AYKAMQRRLN EAWAGRQDGL 

101 PYKNPYEMLI MASLIEKETG HEADRDHVAS VFVNRLKIGM RLQTDPSVIY 

151 GMGAAYKGKI RKADLRRDTP YNTYTGGGLP PTRIALPGKA AMDAAAHPSG 

201 EKYLYFVSKM DGTGLSQFSH DLTEHNAAVR KYILKK* 

Further sequence analysis revealed a partial DNA sequence of ORF7ng <SEQ ID 37>: 

1 . . taccgaatca AGATTGCCAA AAATCAGGGT ATTTCGTCGG TCGGCAGGAA 

51 ACTTGCcgaA GACCGCATCG TGTTCAGCAG GCATGTTTTG ACAGCGGCGG 

101 CCTACGTTTT GGGTGTGCAC AACAGGCTGC ATACGGGGAC gTACAGATTG 

151 CCTTCGGAAG TGTCTGCTTG GGATATCTTG CAGAAAATGC GCGGCGGCAG 

201 GCCGGATTCC GTTACCGTGC AGATTATCGA AGGTTCGCGT TTTTCGCATA 

251 TGAGGAAAGT CATCGACGCA ACGCCCGACA TCGGACACGA CACCAAAGGC 

301 TGGAGCAATG AAAAACTGAT GGCGG AAGTT GCGCCCGATG CCTTCAGCGG 

351 CAATCCTGAA GGGCAGTTTT TTCCCGACAG CTACGAAATC GATGCGGGCG 

401 GCAGCGATTT GCAGATTTAC CAAACCGCCT ACAAGGCGAT GCAACGCCGC 

451 CTGAACGAGG CATGGGCAGG CAGGCAGGAC GGGCTGCCTT ATAAAAACCC 

501 TTATGAAATG CTGATTATGG CGAGCCTGAT CGAAAAGGAA ACGGGGCATG 

551 AGGCCGACCG CGACCATGTC GCTTCCGTCT TCGTCAACCG CCTGAAAATC 

601 GGTATGCGCC TGCAAACCGA CCCGTCCGTG ATTTACGGCA TGGGTGCGGC 

651 ATACAAGGGC AAAATCCGTA AAGCCGACCT GCGCCGCGAC ACGCCGTACA 

701 aCAccTAtac gggcgggggc ttgccgccaa cccggattgc gctgcccggC 

751 Aaggcggcaa tggatgccgc cgcccacccg tccggcgaAa aatacctgTa 

801 tttcgtgtcC AAAATGGACG GCACGGGCTT GAGCCAGTTC AGCCATGATT 

851 TGACCGAACA CAACGCCGCc gTcCGCAAAT ATATTTTGAA AAAATAA 

This corresponds to the amino acid sequence <SEQ ID 38; ORF7ng-l>: 



1 . . YRIKIAKNQG ISSVGRKLAE DRIVFSRHVL TAAAYVLGVH NRLHTGTYRL 

51 PSEVSAWDIL QKMRGGRPDS VTVQIIEGSR FSHMRKVIDA TPDIGHDTKG 

101 WSNEKLMAEV APDAFSGNPE GQFFPDSYEI DAGGSDLQIY QTAYKAMQRR 

151 LNEAWAGRQD GLPYKNPYEM LIMASLIEKE TGHEADRDHV ASVFVNRLKI 

201 GMRLQTDPSV IYGMGAAYKG KIRKADLRRD TPYNTYTGGG LPPTRIALPG 

251 KAAMDAAAHP SGEKYLYFVS KMDGTGLSQF SHDLTEHNAA VRKYILKK* 

ORF7ng-l and ORF7-1 show 98.0% identity in 298 aa overlap: 



10 20 30 40 50 60 

orf 7-1 . pep KLLKWSAVFLTVSAAVFAALLFVPJCDNGRAYRIKIAKNQGISSVGRKLAEDRIVFSRHVL 

I I M I I I II I 1 I I I II I I I I I I I M I I I I I 
orf7ng-l YRIKIAKNQGISSVGRKLAEDRIVFSRHVL 

10 20 30 

70 80 90 100 110 120 

orf 7-1 . pep TAAAYVLGVHNRLHTGTYRLPSEVSAWDILQKMRGGRPDSVTVQIIEGSRFSHMRKVIDA 
> I M M 1 1 II 1 1 1 1 1 M M I 1 1 II 1 1 1 1 1 1 1 1 I 1 1 1 1 M I II M 11 1 1 1 1 1 1 1 1 1 1 M M 
orf7ng-l TAAAYVLGVHNRLHTGTYRLPSEVSAWDILQKMRGGRPDSVTVQIIEGSRFSHMRKVIDA 

40 50 60 70 80 90 

130 140 150 160 170 180 

orf 7-1. pep TPDIGHDTKGWSNEKLMAEVAPDAFSGNPEGQFFPDSYEIDAGGSDLQIYQTAYKAMQRR 
I I I I I I 1 1 1 1 I I 1 1 1 1 1 1 1 1 I I f I I I I I I I I I I K I I I 1 1 1 J | I 1 1 I I 1 1 I I I I i 1 1 i 1 1 I 
orf7ng-l TPDIGHDTKGWSNEKLMAEVAPDAFSGNPEGQFFPDSYEIDAGGSDLQIYQTAYKAMQRR 
100 110 120 130 140 150 

190 200 210 220 230 240 

orf 7-1 . pep LNEAWESRQDGLPYKNPYEMLIMASLVEKETGHEADRDHVASVFVNRLKIGMRLQTDPSV 
MMI = I 1 I I I I I I I 1 I I t I I I I I I r I I I I 1 I I I I I I I I I I I t | I | | | | | | | | | | 1 1 | | 
orf7ng-l LNEAWAGRQDGLPYKNPYEMLIMASLIEKETGHEADRDHVASVFVNRLKIGMRLQTDPSV 
160 170 180 190 200 210 



orf 7-1. pep 



250 260 270 280 290 300 

IYGMGAAYKGKIRKADLRRDTPYNTYTRGGLPPTPIALPGKAALDAAAHPSGEKYLYFVS 
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I 11! II I II III llll I Mill t MM MMM M 1 1 I M I : II M 11 I I I II I I M I I 
nrf7no i lYGMGAAYKGKIRKADLRRDTPYNTYTGGGLPPTRIALPGKAAMDAAAHPSGEKYLYFVS 
y " 220 230 240 250 260 270 



310 320 330 

orf 7-1 .pep KMDGTGLSQFSHDLTEHNAAVRKYILKKX 
MMIMIIMIMMIIIMIIIIMIi 
orf7ng-l KMDGTGLSQFSHDLTEHNAAVRKYILKKX 
280 290 

In addition, ORF7ng-l shows significant homology with a hypothetical E.coli protein: 

SDIP28306IYCEG ECOLI HYPOTHETICAL 38.2 KD PROTEIN IN PABC-HOLB INTERGENIC REGION 
2 1787339 (AE000210) o340; 100% identical to fragment YCEG_ECOLI SW: P28306 but 
has 97 additional C-terminal residues [Escherichia coli) Length = 340 

Score - 7 9 (36.2 bits), Expect - 5.0e-57, Sum P(2) - 5.0e-57 

Identities = 20/87 (22%), Positives = 40/87 (45%) 

nnarv 10 GISSVGRKLAEDRIVFSRHVLTAAAYVLGVHNRLHTGTYRLPSEVSAWDILQKMRGGRPD 69 
° Y ' G ++G +L D+I+ V + + GTYR +++ ++L+ + G+ 

Sbjct: 4 9 GRLALGEQLYADKIINRPRVFQWLLRIEPDLSHFKAGTYRFTPQMTVREMLKLLESGKEA 108 

Query: 70 SVTVQIIEGSRFSHMRKVIDATPDIGH 96 

++++EG R S K + P I H 
Sbjct: 109 QFPLRLVEGMRLSDYLKQLREAPYIKH 135 

Score - 438 (200.7 bits), Expect = 5.0e-57, Sum P(2) = 5.0e-57 
Identities = 84/155 (54%), Positives - 111/155 (71%) 

nnorv 120 EGQFFPDSYEIDAGGSDLQIYQTAYKAMQRRLNEAWAGRQDGLPYKNPYEMLIMASLIEK 17 9 j 

EG F+PD++ A +D+ + + A+K M + ++ AW GR DGLPYK+ +++ MAS+IEK 
Sbjct: 158 EGW PDTWMYTANTT DVALLKRAHKKMVKAVDS AWEGRADGLPYKDKNQLVTMAS I IEK 217 j 

Ouerv 180 ETGHEADRDHVASVFVNRLKIGMRLQTDPSVIYGMGAAYKGKIRKADLRRDTPYNTYTGG 239 

ET ++RD VASVF+NRL+IGMRLQTDP+VIYGMG Y GK+ +ADL T YNTYT 
Sbjct: 218 ETAVASERDKVASVFINRLRIGMRLQTDPTVIYGMGERYNGKLSRADLETPTAYNTYTIT 277 

Query 240 GLPPTRIALPGKAAMDAAAHPSGEECYLYFVSKMDG 274 

GLPP IA PG ++ AAAHP+ YLYFV+ G 
Sbjct: 278 GLP PGAI AT PGAD S LKAAAHPAKT P YLY FVADGKG 312 

Based on this analysis, including the fact that the H.influenzae YCEG protein possesses a possible 
leader sequence, it is predicted that the proteins from N meningitidis and ^gonorrhoeae, and their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 6 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 39>: 

1 CGTTTCAAAA TGTTAACTGT GTTGACGGCA ACCTTGATTG CCGGACAGGT 

51 ATCTGCCGCC GGAGGCGGTG CGGGGGATAT GAAACAGCCG AAGGAAGTCG 

101 GAAAGGTTTT CAGAAAGCAG CAGCGTTACA GCGAGGAAGA AATCAAAAAC 

151 GAACGCGCAC GGCTTGCGGC AGTGGGCGAG CGGGTTAATC AGATATTTAC 

201 GTTGCTGGGA GGGGAAACCG CCTTGCAAAA GGGGCAGGCG GGAACGGCTC 

251 TGGCAACCTA TATGCTGATG TTGGAACGCA CAAAATCCCC CGAAGTCGCC 

301 GAACGCGCCT TGGAAATGGC CGTGTCGCTG AACGCGTTTG AACAGGCGGA 

351 AATGATTTAT CAGAAATGGC GGCAGATTGA GCCTATACCG GGTAAGGCGC 

401 AAAAACGGGC GGGGTGGCTG CGGAACGTGC TGAGGGAAAG AGGAAATCAG 

451 CATCTGGACG GACGGGAAGA AGTGCTGGCT CAGGCGGACG AAGGACAG 

This corresponds to the amino acid sequence <SEQ ID 40; ORF9>: 

l . t RF KMLTVLTA TLIAGQVSAA GGGAGDMKQP KEVGKVFRKQ QRYSEEEIKN 
51 ERARLAAVGE RVNQIFTLLG GETALQKGQA GTALATYMLM LERTKSPEVA 
101 ERALEMAVSL NAFEQAEMIY QKWRQIEPIP GKAQBCRAGWL RNVLRERGNQ 
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151 HLDGREEVLA QADEGQ 

Further sequence analysis revealed the complete DNA sequence <SEQ ID 41>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 



ATGTTACCTA 
TGCCGGACAG 
CGAAGGAAGT 
GAAATCAAAA 
TCAGATATTT 
CGGGAACGGC 
CCCGAAGTCG 
TGAACAGGCG 
CGGGTAAGGC 
AGAGGAAATC 
CGAAGGACAG 
AACAGGACGG 
TTGAAATATG 
CGTACAGGGA 
CGAAGCTCGA 
ACTGCACGCA 
CACCCAAAAC 
TTTCCCTGCA 
GAACGCAATC 
AAACCGAAAA 
ACGGCAGGGG 
ATGATGTATG 
AAAAGTATCC 
CGGCGGCTGT 
AGGGTGCGGA 
TTTGTCCAAA 
AGGCTTTGAG 
AATACAGAGT 
TCGGCTTGGC 
GGCTTGCACC 
CTGACCGATT 
ATACCAAATC 
CGTATTACCT 
TCGTTTGAAA 
GTTGTGGGCA 
CGGCACACCT 
CACGGCATCG 



ACCGTTTCAA 
GTATCTGCCG 
CGGAAAGGTT 
ACGAACGCGC 
ACGTTGCTGG 
TCTGGCAACC 
CCGAACGCGC 
GAAATGATTT 
GCAAAAACGG 
AGCATCTGGA 
AACCGCAGGG 
GTTGGCGCAA 
AACATCTGCC 
CGCGAAAAGG 
TACGGAAATA 
AATATCCCGA 
CTTTCGGCCG 
CAGGCTGGAT 
CGAATGCAGA 
GAAGGTGCTT 
GACGGAGGAA 
CCGACCGCAG 
GCGCCGGAAT 
CGAGTTGGAC 
AACTTCCCGA 
ATACAGATGC 
GGGGTTGGAC 
TACAGGCAGA 
AAGCGGAAAA 
CGATAACGCT 
CCAAACGTTT 
AACCCGGACG 
GAAAGGCGAC 
ACGACCCCGA 
TTGGGCGAAC 
TACGGGAGAC 
CATTGCCCCA 



AATGTTAACT 
CCGGAGGCGG 
TTCAGAAAGC 
ACGGCTTGCG 
GAGGGGAAAC 
TATATGCTGA 
CTTGGAAATG 
ATCAGAAATG 
GCGGGGTGGC 
CGGACTGGAA 
TGTTTTTATT 
AAAGCATCGA 
CGAAGCGGCG 
AAAAGGCAAT 
TTGCCCCCCA 
AATACTCGAC 
TCTGGCAGGA 
GATGCCTATG 
CCTGTATATT 
CCGTTATCGA 
CAGCGGAGCA 
GGATTACGCC 
ACCTGTTCGA 
GGCGGCAGGG 
ACAGCAGGGG 
TCGCCCTGTC 
AAGATTATCG 
GGCATTGGTA 
AAATGATTTC 
CAGATTATGA 
GGACGAAGGT 
ATACCGCTGT 
GCGGAAAGCG 
GCCCGAAGTT 
GCGATCAGGC 
AAGAAAATAT 
ACCTTCCCGA 



GTGTTGACGG 
TGCGGGGGAT 
AGCAGCGTTA 
GCAGTGGGCG 
CGCCTTGCAA 
TGTTGGAACG 
GCCGTGTCGC 
GCGGCAGATT 
TGCGGAACGT 
GAAGTGCTGG 
GTTGGCACAA 
AAGCGGTTCG 
GTTGCCGATG 
CGGAGCTTTG 
CTTTAATGAC 
GGCTTTTTCG 
AATGGAAATT 
CGCGTTTGAA 
CAGGCAGCGA 
CGGCTACGCC 
GGGCGGCGCT 
AAAGTCAGGC 
CAAAGGTGTG 
CGGCTTTGCG 
CGGTATTTTA 
GAAGCTGCCC 
AAAAACCGCC 
CAGCGGTCAG 
AGATCTTGAA 
ATAATCTGGG 
TTCGCCCTGC 
CAACGACAGC 
CGCTGCCGTA 
GCCGCCCATT 
GGTTGACGTA 
GGCGGGAAAC 
AAACCTCGGA 



CAACCTTGAT 
ATGAAACAGC 
CAGCGAGGAA 
AGCGGGTTAA 
AAGGGGCAGG 
CACAAAATCC 
TGAACGCGTT 
GAGCCTATAC 
GCTGAGGGAA 
CTCAGGCGGA 
GCCGCCGTGC 
CCGCGCGGCG 
TGGTGTTCAG 
CAGCGTTTGG 
GTTGCGTCTG 
AGCAGACAGA 
ATGAATCTGG 
CGTGCTGTTG 
TATTGGCGGC 
GAAAAGGCAT 
AACGGCGGCG 
AGTGGCTGAA 
CTGGCGGCTG 
GCAGATCGGC 
CGGCAGACAA 
GATAAACGGG 
TGCCGGCAGT 
TTGTTTACGA 
AGGGCGTTCA 
CTACAGCCTG 
TTCAGACGGC 
ATAGGCTGGG 
TCTGCGGTAT 
TGGGCGAAGT 
TGGACGCAGG 
GCTCAAACGT 
AATAA 



This corresponds to the amino acid sequence <SEQ ID 42; ORF9-l>: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 



MLPNRFKMLT VLTATLIAGQ VSAAGGGAGD 



EIKNERARLA 
PEVAERALEM 
RGNQHLDGLE 
LKYEHLPEAA 
TARKYPEILD 
ERNPNADLYI 
MMYADRRDYA 
RVRKLPEQQG 
NTELQAEALV 
LTDSKRLDEG 
SFENDPEPEV 
HGIALPQPSR 



AVGERVNQIF 
AVSLNAFEQA 
EVLAQADEGQ 
VADWFSVQG 
GFFEQTDTQN 
QAAILAANRK 
KVRQWLKKVS 
RYFTADNLSK 
QRSWYDRLG 
FALLQTAYQI 
AAHLGEVLWA 
KPRK* 



TLLGGETALQ 
EMIYQKWRQI 
NRRVFLLLAQ 
REKEKAIGAL 
LSAVWQEMEI 
EGASVIDGYA 
APEYLFDKGV 
IQMLALSKLP 
KRKKMISDLE 
NPDDTAVNDS 
LGERDQAVDV 



MKQPKEVGKV 
KGQAGTALAT 
EPIPGKAQKR 
AAVQQDGLAQ 
QRLAKLDTEI 
MNLVSLHRLD 
EKAYGRGTEE 
LAAAAAVELD 
DKREALRGLD 
RAFRLAPDNA 
IGWAYYLKGD 
WTQAAHLTGD 



FRKQQRYSEE 
YMLMLERTKS 
AGWLRNVLRE 
KASKAVRRAA 
LPPTLMTLRL 
DAYARLNVLL 
QRSRAALTAA 
GGRAALRQIG 
KIIEKPPAGS 
QIMNNLGYSL 
AESALPYLRY 
KKIWRETLKR 



Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.meninzitidis (strain A) 

ORF9 shows 89.8% identity over a 166aa overlap with an ORF (ORF9a) from strain A of N. 
meningitidis: 

10 20 30 40 50 

or f 9 . pep RFKMLTVLTATLIAGQVSAAGGGAGDMKQPKEVGKVFRKQQRYSEEEIKNERARLA 
I I : I : I I : I: I : I I I : II I I : I I I I I I I I I I I I I I I I I I I I I I It I I I I I 
orf9a MLPARFT I LSVLAAALLAGQAYAA — GAADAKPPKEVGKVFRKQQRYSEEEIKNERARLA 
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10 



20 



30 



40 



50 



orf9.pep 
orf9a 

or f 9. pep 
orf9a 

orf9a 



60 70 80 90 100 110 

AVGERVNOIFTLLGGETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFEQA 

Ml Mil I HI HI IMMIIMIMIMIIMMIIIMIMMIIIMIIMIMM 
AVGERVNQI FTLLGXETALQKGQAGTALAT YMLMLERTKS PE VAERALEMAVSLNAFEQA 
60 70 80 90 100 110 

120 130 140 150 160 

EMIYQKWRQIEPIPGKAQKRAGWLRNVLRERGNQHLDGREEVLAQADEGQ 

i I I M I II M I I M M II II II II I II I I M I II 1 II I II IMMI I 
EMIYQKWRQIEPIPGKAQKRAGWLRNVLRERGNQHLDGLEEXLAQADEXQNRR 

12 0 130 140 150 160 170 



AAVQQDGLAQKASKAVRRAALRYEHLPEAAVADWFSVQXREKEKAIGALQRLAKLDTEI 
190 200 210 220 230 



180 



The complete length ORF9a nucleotide sequence <SEQ ID 43> is: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 



ATGTTACCCG 
TGCCGGGCAG 
AAGTCGGAAA 
AAAAACGAAC 
ATTTACGTTG 
CGGCTCTGGC 
GTCGCCGAAC 
GGCGGAAATG 
AGGCGCAAAA 
AATCAGCATC 
ACAGAACCGC 
ACGGGTTGGC 
TATGAACATC 
GGNACGCGAA 
TCGATACGGA 
CGCAAATATC 
AAACCTTTCG 
TGCACAGGCT 
AATCCGAATG 
AAAAGAANGT 
GGGGGACGGG 
TATGCCGACC 
GTCCGCGCCG 
CTGTCGAGTT 
CGGAAACTTC 
CAAAATACAG 
TGAGGGGGTT 
GAGTTACAGG 
TGGCAAGCGG 
CACCCGATAA 
GATTCCAAAC 
AATCAACCCG 
ACCTGAAANG 
GAAAACGACC 
GGCATTGGGC 
ACCTTACGGG 
ATCGCATTGC 



CCCGTTTCAC 
GCGTATGCCG 
GGTTTTCAGA 
GCGCACGGCT 
CTGGGANGGG 
AACCTATATG 
GCGCCTTGGA 
ATTTATCAGA 
ACGGGCGGGG 
TAGACGGACT 
AGGGTGTTTT 
GCAAAAAGCA 
TGCCCGAAGC 
AAGGAAAAGG 
AATATTGCCC 
CCGAAATACT 
GCCGTCTGGC 
GGATGATGCC 
CAGACCTGTA 
GCTTCCGTTA 
GGAACAGCGG 
GAAGGGATTA 
GAATACCTGT 
GGACNGCGGC 
CCGAACAGCA 
ATGTTCGCCC 
GGACAAGATT 
CAGAGGCATT 
AAAAAAATGA 
CGCTCAGATT 
GTTTGGACGA 
GACGATACCG 
CGACGCGGAA 
CCGAGCCCGA 
GAACGCGATC 
AGACAAGAAA 
CCCAACCTTC 



CATTTTATCT 
CCGGCGCGGC 
AAGCAGCAGC 
TGCGGCAGTG 
AAACCGCCTT 
CTGATGTTGG 
AATGGCCGTG 
AATGGCGGCA 
TGGCTGCGGA 
GGAAGAANTG 
TATTGTTGGC 
TCGAAAGCGG 
GGCGGTTGCC 
CAATCGGAGC 
CCCACTTTAA 
CGACGGCTTT 
AGGAAATGGA 
TATGCGCGTT 
TATTCAGGCA 
TCGACGGCTA 
GGCAGGGCGG 
CACCAAAGTC 
TCGACAAAGG 
AGGGCGGCTT 
GGGGCGGTAT 
TGTCGAAGCT 
ATCGAAAAAC 
GGTACAGCGG 
TTTCAGATCT 
ATGAATAATC 
AGGCTTCGCC 
CTGTCAACGA 
AGCGCGCTGC 
AGTTGCCGCC 
AGGCGGTTGA 
ATATGGCGGG 
CCGAAAACCT 



GTGCTCGCGG 
GGATGCGAAG 
GTTACAGCGA 
GGCGAGCGGG 
GCAAAAGGGG 
AACGCACAAA 
TCNCTGAACG 
GATTGAGCCT 
ACGTGCTGAG 
CTGGCTCAGG 
ACAAGCCGCC 
TTCGCCGCGC 
GATGTGGTGT 
TTTGCAGCGT 
TGACGTTGCG 
TTCGAGCAGA 
AATTATGAAT 
TGAACGTGCT 
GCGATATTGG 
CGCCGAAAAG 
CAATGACGGC 
AGGCAGTGGT 
TGTGCTGGCG 
TGCGGCAGAT 
TTTACGGCAG 
GCCCGACAAA 
CGCCTGCCGG 
TCAGTTGTTT 
TGAAAGGGCG 
TGGGCTACAG 
CTGCTTCAGA 
CAGCATAGGC 
CGTATCTGCG 
CATTTGGGCG 
CGTATGGACG 
AAACGCTCAA 
CGGAAATAA 



CAGCCCTGCT 
CCGCCGAAGG 
GGAAGAAATC 
TTAATCAGAT 
CAGGCGGGAA 
ATCCCCCGAA 
CGTTTGAACA 
ATACCGGGTA 
GGAAAGAGGA 
CGGACGAANG 
GTGCAACAGG 
GGCGTTGAGA 
TCAGCGTACA 
TTGGCGAAGC 
TCTGACTGCA 
CAGACACCCA 
CTGGTTTCCC 
GTTGGAACGC 
CGGCAAACCG 
GCATACGGCA 
GGCGATGATA 
TGAAAAAAGT 
GCTGCGGCGG 
CGGCAGGGTG 
ACAATTTGTC 
CGGGAGGCTT 
CAGTAATACA 
ACGATCGGCT 
TTCAGGCTTG 
CCTGCTTTCC 
CGGCATACCA 
TGGGCGTATT 
GTATTCGTTT 
AAGTGTTGTG 
CAGGCGGCAC 
ACGTCACGGC 



This encodes a protein having amino acid sequence <SEQ ID 44>: 



1 MLPARFTILS VLAAALLAGQ 

51 KNERARLAAV 

101 VAERALEMAV 

151 NQHLDGLEEX 

201 YEHLPEAAVA 

251 RKYPEILDGF 

301 NPNADLYIQA 

351 YADRRDYTKV 

401 RKLPEQQGRY 

451 ELQAEALVQR 

501 DSKRLDEGFA 

551 ENDPEPEVAA 



GERVNQIFTL 
SLNAFEQAEM 
LAQADEXQNR 
DWFSVQXRE 
FEQTDTQNLS 
AILAANRKEX 
RQWLKKVSAP 
FTADNLSKIQ 
SWYDRLGKR 
LLQTAYQINP 
HLGEVLWALG 



AYAAGAA DAK 
LGXETALQKG 
IYQKWRQIEP 
RVFLLLAQAA 
KEBCAIGALQR 
AVWQEMEIMN 
ASVIDGYAEK 
EYLFDKGVLA 
MFALSKLPDK 
KKMISDLERA 
DDTAVNDSIG 
ERDQAVDVWT 



PPKEVGKVFR 
QAGTALATYM 
IPGKAQKRAG 
VQQDGLAQKA 
LAKLDTEILP 
LVSLHRLDDA 
AYGRGTGEQR 
AAAAVELDXG 
REALRGLDKI 
FRLAPDNAQI 
WAYYLKXDAE 
QAAHLTGDKK 



KQQRYSEEEI 
LMLERTKSPE 
WLRNVLRERG 
SKAVRRAALR 
PTLMTLRLTA 
YARLNVLLER 
GRAAMTAAMI 
RAALRQIGRV 
IEKPPAGSNT 
MNNLGYSLLS 
SALPYLRYSF 
IWRETLKRHG 
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601 IALPQPSRKP RK* 



ORF9a and ORF9-1 show 95.3% identity in 614 aa overlap: 



10 20 30 40 50 

5 orf 9a . pep MLPARFTILSVLAAALLAGQAYAAG — AAD AK P PKE VGKV FRKQQR Y S E E E I KNE RARLA 

Ml M : I : I I : I : I : I I I : III I : I I I I t I 1 I I I I I I I I t I I I I I I i I I I I II 
orf 9-1 MLPNRFKMLTVLTATLIAGQVSAAGGGAGDMKQPKEVGKVFRKQQRYSEEEIKNERARLA 

10 20 30 40 50 60 

10 60 70 80 90 100 110 

or f 9a . pep AVGERVNQI FT LLGXETALQKGQAGT ALATYMLMLERTKS PEVAERALEMAVSLNAFEQA 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I | | | | | I || | | | | | | | M | | | | | | | | j | | | | 
o r f 9 - 1 AVGERVNQI FTLLGGETALQKGQAGT ALATYMLMLERTKS PE VAERALEMAVS LNAFEQA 

70 80 90 100 110 120 

15 

120 130 140 150 160 170 

orf 9a . pep EMI YQKWRQ I E P I PGKAQKRAGWLRNVLRERGNQHLDGLEEXLAQADEXQNRRVFLLLAQ 
I I I M I I t I I I I I I I I I I I I I I I I 1 I I I I I I I I t II I I I I I I I I I I I | | | I | | | I M I 
orf 9-1 EMIYQKWRQIEPIPGKAQKRAGWLRNVLRERGNQHLDGLEEVLAQADEGQNRRVFLLLAQ 
20 130 140 150 160 170 180 

180 190 200 210 220 230 

orf 9a . pep AAVQQDGLAQKASKAVRRAALRYEHLPEAAVADWFSVQXREKEKAIGALQRLAKLDTEI 
MINI M I Mil II I Mil l: I Ml I I I II I I I II III illlllllllllllllllll 
25 orf 9-1 AAVQQDGLAQKASKAVRRAALKYEHLPEAAVAD WFS VQGREKEKAI GALQRLAKLDTE I 

190 200 210 220 230 240 



30 



240 250 260 270 280 290 

or f 9a . pep LPPTLMTLRLTARKYPEILDGFFEQTDTQNLSAVWQEMEIMNLVSLHRLDDAYARLNVLL 
I I I I I I I I I I I I I II I I I I I I I I I II I I I I I I I I II I I I I I I I I II I I II I I I I I I I I II 
orf 9-1 LPPTLMTLRLTARKYPEILDGFFEQTDTQNLSAVWQEMEIMNLVSLHRLDDAYARLNVLL 
250 260 270 280 290 300 



35 



40 



45 



50 



300 310 320 330 340 350 

orf 9a . pep ERNPNADLYIQAAILAANRKEXASVIDGYAEKAYGRGTGEQRGRAAMTAAMIYADRRDYT 
MINI I I I I I 1 I I I ! I J I I I I I I I I I I I I I I I I I I I I MM II: II I 1:1 I M I I I: 
orf 9-1 ERNPNADLYIQAAILAANRKEGASVIDGYAEKAYGRGTEEQRSRAALTAAMMYADRRDYA 
310 320 330 340 350 360 

360 370 380 390 400 410 

orf 9a . pep KTOC2WLKKVSAPEYLFDKGVLAAAAAWLDXGRAALRQIGRVRKLPEQQGRYFTADNLSK 
I I I I I I I I I I M M I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I II I || I I I I M II 11 I 
or f 9- 1 KVRQWLKKVSAPEYLFDKGVLAAAAAVELDGGRAALRQIGRVRKLPEQQGRYFTADNLSK 
370 380 390 400 410 420 

420 430 440 450 460 470 

or f 9a . pep IQMFALSKLPDKREALRGLDKIIEKPPAGSNTELQAEALVQRSWYDRLGKRKKMISDLE 
MI:H IN MM 111 II III II III Ml III I III MM I IN Ml I IIIM MM III 
orf 9-1 IQMLALSKLPDKREALRGLDKIIEKPPAGSNTELQAEALVQRSWYDRLGKRKKMISDLE 
430 440 450 460 470 480 



55 



480 490 500 510 520 530 

orf 9a . pep RAFRLAPDNAQIMNN LG YS LLS DSKRLDEGFALLQTAYQINPDDTAVNDS I GWAYYLKXD 
II M II II I I II I I II I I II I : I II I I I I II II II II II I I I I II I I M II I II II II I 
orf 9-1 RAFRLAPDNAQIMNN LGYSLLTDSKRLDEGFALLQTAYQINPDDTAVNDSIGWAYYLKGD 

490 500 510 520 530 540 



60 



540 550 560 570 580 590 

orf 9a . pep AESALPYLRYSFENDPEPEVAAHLGEVLWALGERDQAVDVWTQAAHLTGDKKIWRETLKR 
I M I I 1 1 II I II I II M I II I II I M II I I 1 1 II M It M I II I I M I I I II M II M I I 
orf 9-1 AESALPYLRYSFENDPEPEVAAHLGEVLWALGERDQAVDVWTQAAHLTGDKKIWRETLKR 
550 560 570 580 590 600 



600 610 
65 orf 9a. pep HGIALPQPSRKPRKX 

1 1 1 ! 1 1 1 i 1 1 : 1 1 1 < 

orf 9-1 HGIALPQPSRKPRKX 
610 
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Homology wit h a predicted c mv from N. gonorrhoeae 

ORF9 shows 82.8% identity over a 163aa overlap with a predicted ORF (ORF9.ng) from N. 
gonorrhoeae: 

RFKMLTV LTATLIAGQVSAAGGGAGDMKQEKEVGKVFRKQQRYSEEEIKNERAR 54 
° rf9 || : |: | |: |: I: I I I: II MM:: I I I I I I I M I : M I I II 1 1 I I I I I I 

orfSng MIMLPARFTILSVLAAALLAGQAYAA — GAADVELPKEVGKVLRKHRRYSEEEIKNERAR 58 

114 



„ rfQ LAAVGERVNQI FTLLGGETALQKGQAGTALATYMLMLERTKS PEVAERALEMAVSLNAFE 

° Ml llllll:: 1 1 | II M I M II I II I I I I II I 1 1 1 1 1 1 1 I II I M I I I I II 1 1 I I MM 

orf9ng ^OTR^VnLI^TAMKG<^AIATYtH^RTKSPEVAERAI^VSL^ 



orf9 
orf9ng 



QAEMIYQKWRQIEPIPGKAQKRAGWLRNVLRERGNQHLDGREEVtAQADEGQ 166 
I I I I I I I II I I II I II I : II I II II I II I : I M Ml III MM' 

QAEMIYQKWRQIEPIPGEAQKPAGWLRNVLKEGGNPHLDRLEEVPAQSDYVHQPMIFLLL 17 8 



The ORF9ng nucleotide sequence <SEQ ID 45> was predicted to encode a protein having including 
acid sequence <SEQ ID 46>: 

1 mtmt.PARFT I LSVLAAALLA GQAYAAGA AD VELPKEVGKV LRKHRRYSEE 

51 eiknERARLA AVGERVNRVF TLLGGETALQ KGQAGTALAT YMLMLERTKS 

1 01 PEVAERALEM AVSLNAFEQA EMIYQKWRQI EPIPGEAQKP AGWLRNVLKE 

151 GGNPHLDRLE EVPAQSDYVH QP MIFLLLVQ AAVQHGGVA Q KPSKAVRPAA 

201 YNYEVLPETA GADAVFCVQG PQYEKAIQSF PPCGRNPQTE NIAPPFNELF 

251 RPTARPISPK LLQRFFRTEP NLAKPFRPPG PEMETYQTGF PRPLTRHNPT 

Amino acids 1-28 are a putative leader sequence, and 173-189 are predicted to be a transmembrane 

i 

domain. ' 
Further sequence analysis revealed the complete length ORF9ng DNA sequence <SEQ ID 47>: 

1 ATGTTACCCG CCCGTTTCAC TATTTTATCT GTCCTCGCAG CAGCCCTGCT 
51 TGCCGGACAG GCGTATGCTG CCGGCGCGGC GGATGTGGAG CTGCCGAAGG 
101 AAGTCGGAAA GGTTTTAAGG AAACATCGGC GTTACAGCGA GGAAGAAATC 
151 AAAAACGAAC GCGCACGGCT TGCGGCAGTG GGCGAACGGG TCAACAGGGT 
201 GTTTACGCTG TTGGGCGGTG AAACGGCTTT GCAGAAAGGG CAGGCGGGAA 
251 CGGCTCTGGC AACCTATATG CTGATGTTGG AACGCACAAA ATCCCCCGAA 
301 GTCGCCGAAC GCGCCTTGGA AATGGCCGTG TCGCTGAACG CGTTTGAACA 
351 GGCGGAAATG ATTTATCAGA AATGgcggca gatcgagcct ataCcgggtg 
401 aggcgcaaaa accgGcgggG tggctgcgga acgtattgaa ggaagggGGa 
451 aaTCAGCATC TGGAcgggtt gaaagaggTG CtggcgcaAT cggacgatGT 
501 GCAAAAAcgc aggaTATTTT TGCTGCTGGT GCAAGCCGCC GTGCagcagg 
551 aTGGGGTGGC TCAAAAAGCA TCGAAAGCGG TTCGCcgtgc GGcgttgaAG 
fiOl TATGAACATC TGCCcgaagc ggcggTTGCC GATGcggTGT TCGGCGTACA 
Hi GGGaScGAA AAGGAAAagg caaTCGAAGC TTTGCAGCGT TTGGCGAAGC 
701 TCGATACGGA AATATTGCCC CCCACTTTAA TGACGTTGCG TCTGACTGCA 
751 CGCAAATATC CCGAAATACT CGACGGCTTT TTCGAGCAGA CAGACACCCA 
B01 AAACCTTTCG GCCGTCTGGC AGGAAATGGA AATTATGAAT CTGGTTTCCC 
851 TGCGTAAGCC GGATGATGCC TATGCGCGTT TGAACGTGCT GTTGGAACAC 
901 AACCCGAATG CAAACCTGTA TATTCAGGCG GCGATATTGG CGGCAAACCG 
951 AAAAGAAGGT GCGTCCGTTA TCGACGGCTA CGCCGAAAAG GCATACGGCA 
1001 GGGGGACGGG GGAACAGCGG GGCagggcgg cAATgacggc GGCGATGATA 
1051 TATGCCGACC GCAGGGATTA CGCCAAAGTC AGGCAGTGGT TGAAAAAAGT 
1101 GTCCGCGCCG GAATACCTGT TCGACAAAGG CGTGCTGGCG GCTGCGGCGG 
1151 CTGCCGAATT GGACGGAGGC CGGGCGGCTT TGCGGCAGAT CGGCAGGGTG 
1201 CGGAAACTTC CCGAACAGCA GGGGCGGTAT TTTACGGCAG ACAATTTGTC 
1251 CAAAATACAG ATGCTCGCCC TGTCGAAGCT GCCCGACAAA CGGGAAGCCC 
1301 TGATCGGGCT GAACAACATC ATCGCCAAAC TTTCGGCGGC GGGAAGCACG 
1351 GAACCTTTGG CGGAAGCATT GGCACAGCGT TCCATTATTT ACGaacAGTT 
1401 cggCAAACGG GGAAAAATGA TTGCCGACCT tgaAACcgcg CTCAAACTTA 
s 1451 CGCCCGATAA TGCACAAATT ATGAATAATC TGGGCTACAG CCTGCTTTCC 

5 \iH GATTCCAAAC GTTTGGACGA GGGTTTCGCC CTGCTTCAGA CGGCATACCA 

1551 AATCAACCCG GACGATACCG CCGTTAACGA CAGCATAGGC TGGGCGTATT 
1601 ACCTGAAAGG CGACgcggaA AGCGCGCTGC CGTATCTGcg gtattcgttt 
1651 gAAAACGACC CCGAGCCCGA AGTTGCCGCC CATTTGGGCG AAGTGTTGTG 
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1701 GGCATTGGGC GAACGCGATC AGGCGGTTGA CGTATGGACG CAGGCGGCAC 
1751 ACCTTAGGGG AGACAAGAAA ATATGGCGGG AGACGCTCAA ACGCTACGGA 
1801 ATCGCCTTGC CCGAGCCTTC CCGAAAACCC CGGAAATAA 

This encodes a protein having amino acid sequence <SEQ ID 48>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 



MLPARFTILS VLAAALLAGQ AYAAGAADVE 



KNERARLAAV 
VAERALEMAV 
NQHLDGLKEV 
YEHLPEAAVA 
RKYPEILDGF 
NPNANLYIQA 
YADRRDYAKV 
RKLPEQQGRY 
EPLAEALAQR 
DSKRLDEGFA 
ENDPEPEVAA 
IALPEPSRKP 



GERVNRVFTL 
SLNAFEQAEM 
LAQSDDVQKR 
DAVFGVQGRE 
FEQTDTQNLS 
AILAANRKEG 
RQWLKKVSAP 
FTADNLSKIQ 
SIIYEQFGKR 
LLQTAYQINP 
HLGEVLWALG 
RK* 



LGGETALQKG 
IYQKWRQIEP 
RIFLLLVQAA 
KEKAIEALQR 
AVWQEMEIMN 
AS VI DGYAEK 
EYLFDKGVLA 
MLALSKLPDK 
GKMIADLETA 
DDTAVNDSIG 
ERDQAVDVWT 



LPKEVGKVLR 
QAGTALATYM 
IPGEAQKPAG 
VQQGGVAQKA 
LAKLDTEILP 
LVSLRKPDDA 
AYGRGTGEQR 
AAAAAELDGG 
REALIGLNNI 
LKLTPDNAQI 
WAYYLKGDAE 
QAAHLRGDKK 



KHRRYSEEEI 
LMLERTKSPE. 
WLRNVLKEGG 
SKAVRRAALK 
PTLMTLRLTA 
YARLNVLLEH 
GRAAMTAAMI 
RAALRQIGRV 
IAKLSAAGST 
MNNLGYSLLS 
SALPYLRYSF 
IWRETLKRYG 



ORF9ng and ORF9-1 show 88.1% identity in 614 aa overlap: 



10 20 30 40 50 60 

orf 9-1 . pep MLPNRFKMLTVLTATLIAGQVSAAGGGAGDMKQPKEVGKVFRKQQRYSEEEIKNERARLA 
III i I : I : I I : I : I : I I I : III I : I : : I I I I I I I : I I : : I I II I I I I I I I I I I I 
orf9ng-l MLPARFT ILSVLAAALLAGQAYAAG — AADVELPKEVGKVLRKHRRYSEEE IKNERARLA 

10 20 30 40 50 

70 80 90 100 110 120 

orf 9-1 . pep AVGERVNQI FTLLGGETALQKGQAGTALAT YMLMLERTKS PEVAERALEMAVSLNAFEQA 

I I I I I I I:: I I I II I I I I I I II I I I I II I II I II I I I I I II I I I I I I I I i I I I I I I I I I I 
orf9ng-l AVGERVNRVFTLLGGETALQKGQAGTALAT YMLMLERTKS PEVAERALEMAVSLNAFEQA 

60 70 80 90 100 110 

130 140 150 160 170 180 

orf 9-1 . pep EM I YQKWRQ I E P I PGKAQKRAGWLRNVLRERGNQHLDGLEE VLAQADEGQNRRVFLLLAQ 
I I I I I I I I I I I I I I I : I II 11111111:1 I I I I I I I I : I I I I I : I : I : I I : I I I I : I 
orf9ng-l EMI YQKWRQIEPI PGEAQKPAGWLRNVLKEGGNQHLDGLKEVLAQS DDVQKRRI FLLLVQ 

120 130 140 150 160 170 

190 200 210 220 230 240 

orf 9-1 . pep AAVQQDGLAQKASKAVRRAALKYEHLPEAAVADWFSVQGREKEKAIGALQRLAKLDTEI 

I I II I I: I I I I I I II I II I I I I I I I I I I I I I I : II : I I I I I I I I I I I I I I I I I I I I I I 
orf 9ng-l AAVQQGGVAQKASKAVRRAALKYEHLPEAAVADAVFGVQGREKEKAIEALQRLAKLDTEI 

180 190 200 210 220 230 

250 260 270 280 290 300 

orf 9-1 . pep LPPTLMTLRLTARKYPEILDGFFEQTDTQNLSAVWQEMEIMNLVSLHRLDDAYARLNVLL 
Mill I llll Mil Mill I II Mill Ml I INI II IN II Ml I:: IIIIIIIIMI 
orf9ng-l LPPTLMTLRLTARKYPEILDGFFEQTDTQNLSAVWQEMEIMNLVSLRKPDDAYARLNVLL 
240 250 260 270 280 290 

310 320 330 340 350 360 

orf 9-1 . pep ERN PNADLY I QAAILAANRKEGAS VI DGYAEKAYGRGTEEQRSRAALTAAMMYADRRD YA 
I : I I I I : I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I : I I I : I I I I : I I I I I I I I 
orf9ng-l EHNPNANLYIQAAILAANRKEGASVIDGYAEKAYGRGTGEQRGRAAMTAAMIYADRRDYA 
300 310 320 330 340 350 

370 380 390 400 410 420 

orf 9-1 .pep KVRQWLKKVSAPEYLFDKGVLAAAAAVELDGGRAALRQIGRVRKLPEQQGRYFTADNLSK 

II I I I I II I II I I I I I I I I I I I I I I I : II II I I II I I I I I I I I I I I I I I I I I I I I I I I I I 
orf9ng-l KVRQWLKKVSAPEYLFDKGVLAAAAAAELDGGRAALRQIGRVRKLPEQQGRYFTADNLSK 

360 370 380 390 400 410 

430 440 450 460 470 480 

orf 9-1 .pep IQMLALSKLPDKREALRGLDKI I EKPPAGS NTELQAEALVQRS WYDRLGKRKKMI S DLE 
I I M I I I I I I I I I I I I ll::ll I l:::ll I I I 1 : 1 1 I : : I : : : I I I I M : I I I 
orf9ng-l IQMLALSKLPDKREALIGLNNIIAKLSAAGSTEPLAEALAQRSIIYEQFGKRGKMIADLE 
420 430 440 450 460 470 



490 



500 



510 



520 



530 



540 
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RAFRIAPDNRQI^LGYSLLTDSKRLDEGFALLQTAYQINPDDTAVNDSIOTA^LKGD 

orf9ng _ 1 ^tp^a^^ 

ortsng l 49Q 50Q 510 52 o 530 



orf9-l.pep 



5 „ 0 560 570 580 590 600 

« -, „ AF<3ALPYLRYSFENDPEPEVAAHIiGEVLWALGERDQAVDVWTQAAHLTGDKKIWRETLKR 

orf9ng-l aIsalpYLRYSFE^ 
- 54Q 550 570 

610 

orf 9-1 .pep HGIALPQPSRKPRKX 
P : 1 1 I I I : I I I I I I I I 

1* orf9ng-l YGIALPEPSRKPRKX 

iJ 600 610 

In addition, ORF9ng shows significant homology with a hypothetical protein from P.aeruginosa: 

S p|P42810|YHE3_PSEAE HYPOTHETICAL 64.8 KD PROTEIN IN HEMM-HEMA INTERGENIC REGION 

20 i g m072999l P ir, ,S49376 hypothetical protein 3 - Pseuuomonas aeruginosa >gi|557259 

(X82071) orf 3 [Pseudomonas aeruginosa] Length - 57 6 
laenti^es 2 ! ^^^^^ "8/587 (38,,, Gaps - 125/587 (21%, 

25 Query: 67 vm-—^ 126 

SbjCt: 53 LYSL^VAELAGQRNRFDIALSNYVVQAQKTRDPGVSERAFRIAEYI<GADQEALDTSLLWA 112 

.„ OTPPIPGEAOKPAG WLRNVtKEGGNQHLDGLKEVLAQSDDVQKRRI 172 . 

Query: 127 Q 18 "™^? 3 ++ VL G+ H D L A++D + + 

30 Sbjct: 113 rsAP^NLDAQRAAAIQIARAGRYEESMVYMEKVIjNGQGDTHFDFIA^ 112 | 

Query: 173 E^XXXXXXXXXXXXX^ 232 
35 Sbjcf 173 L QSFDHLLKKYPNNGQLLFGKALLLQQDGRPDEALTLLEDNS 214 

Query: 233 KLDTEILPPTLMTLRLTARK ^p E ^^q^^^^+^+^^^+ E ^ M1 'lV^^+ E> ^ 

Sbjct: 215 ASRHEVAPLLLRSRLLQSl^RSDEALPLIJCAGIKEHPDDKRVRIAYA LVEQNRL 270 

40 Query: 288 DDAYARLNVLLEHNPN ANLYIQAAI 312 

SbjCt: 271 SS^FAGtvQQFPDDDDDLRFSIALVCI^QAWDEARIYLEELWRDSHVDAAHFNLG 330 

45 Query: 313 -IAANRKEGASVIDGYAEKAYGRGTGEQRGRARMTAAMIYADRRDYWCVRQ 371 

Sbjct: 331 R^EQ^ARA^E^^ 388 
Query: 372 YLFDKXXXXXXXXXXXX^^ 431 

50 Y . AIQLYLIEAEALSNNDQQE 408 

Sbjct: 389 Y 

Query: 432 EALIGLNNIIAKLSAAGSTEPIAEAIAQRSIIYE^ 
55 Sbjct: 409 SwQAIQEGLKQYP EDL-NLLYTRSMLAEKRNDLAQMEKDLRFVIAREPDNAMAL 462 

Query: 492 -™f^^^ ™ 
Sbjct: 463 Sa^SlWtTRYGEARELILK^^ 522 
60 Query: 552 ND PEPEVAAHI^EVLWALGERDQAVDVWTQAAHLRGDKKIWRETLKR 598 

u J p+ EVAAHLGEVLWA G+A+W+ +D+R T+KR 

Sbjct: 523 RYPDHEVAAHLGEVLWAQGRQGDARAIWREYLDKQPDSDVLRRTIKR 569 

M gi 1 2983399 (AE000710) hypothetical protein [Aquifex aeolicus) Length = 545 

SSi^ SX^SU,^ J~ - 9^ ,48%) . Gaps - 19/198 (9%, 



Query: 408 GRYFTADNL-SKIQMIiALSRXPDKREALIGlJTOIIAKLSAAGSTEPLAEALAQ 459 

7 Q ° y G Y A L K ++LA PDK+E L + +K 
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Sbjct: 335 GNYEDAKRLIEKAKVLA PDKKEILFLEADYYSKTKQYDKALEILKKLEKDYPNDSR 390 

RSIIYEQFGKRGKMIADLETALKLTPDNAQIMNNLGYSLLS — DSKRLDEGFALLQ 513 

+I+Y+ G L A++L P+N N LGYSLL +R++E L++ 



Sbjct: 


335 


Query: 


4 60 


Sbjct: 


391 


Query: 


514 


Sbjct: 


451 


Query: 


573 


Sbjct: 


511 



A + +P++ A DS+GW YYLKGD E A+ YL + E +P V H+G+VL +G + 
KALEKDPENPAYIDSMGV^YLKGDYERAMQYLLKALREAYDDPVVNEHVGDVLLKMGYK 510 

DQAVDVWTQAAHLRGDKK 590 
++A + + +A L + K 
EEARNYYERALKLLEEGK 528 

Based on this analysis, it is predicted that the proteins from Kmeningitidis and N.gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 7 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 49>: 

1 AACCTCTACG CCGGCCCGCA GACCACATCC GTCATCGCAA ACATCGCCGA 

51 CAACCTGCAA CTGGCCAAAG ACTACGGCAA AGTACACTGG TTCGCCTCCC 

101 CGCTCTTCTG GCTCCTGAAC CAACTGCACA ACATCATCGG CAACTGGGGC 

151 TGGGCGATTA TCGTTTTAAC CATCATCGTC AAAGCCGTAC TGTATCCATT 

201 GACCAACGCC TCTTACCGCT CTATGGCGAA AATGCGTGCC GCCGCACCCA 

251 AACTGCAAGC CATCAAAGAG AAATACGGCG ACGACCGTAT GGCGCAACAA 

301 CAGGCGATGA TGCAGCTTTA CACAGACGAG AAAATCAACC CGaCTGGGCG 

351 GCTGCCTGCC TATGCTGTTG CAAATCCCCG TCTTCATCGG ATTGTATTGG 

401 GCATTGTTCG CCTCCGTAGA ATTGCGCCAG GCACCTTGGC TGGGTTGGAT 

451 TACCGACCTC AGCCGCGCCG ACCCCTACTA CATCCTGCCC ATCATTATGG 

501 CGGCAACGAT GTTCGCCCAA ACTTATCTGA ACCCGCCGCC GAcCGACCCG 

551 ATGCagGCGA AAATGATGAA AATCATGCCG TTGGTTTTCT CsGwCrTGTT 

601 CTTCTTCTTC CCTGCCGGks TGGTATTGTA CTGGGTAGTC AACAACCTCC 

651 TGACCATCGC CCAGCAATGG CACATCAACC GCAGCATCGA AAAACAACGC 

701 GCCCAAGGCG AAGTCGTTTC CTAA 

This corresponds to the amino acid sequence <SEQ ID 50; ORF1 1>: 

1 . . NLYAGPQTTS VIANIADNLQ LAKDYGKVHW FASPLFWLLN QLHNIIGNWG 
51 W AIIVLTIIV KAVLYPLT NA SYRSMAKMRA AAPKLQAIKE KYGDDRMAQQ 
101 QAMMQLYTDE KINPLGGCLP MLLQIPVFIG LYWALFA SVE LRQAPWLGWI 
151 TDLSRADPYY ILPIIMAATM FAQTYLNPPP TDPMQAKMMK IMP LVFSXXF 
201 FFFPAGXVLY WWNNLLTIA QQWHINRSIE KQRAQGEWS * 

Further sequence analysis revealed the complete DNA sequence <SEQ ID 51>: 

1 ATGGATTTTA AAAGACTCAC GGCGTTTTTC GCCATCGCGC TGGTGATTAT 

51 GATCGGCTGG GAAAAGATGT TCCCCACTCC GAAGCCAGTC CCCGCGCCCC 

101 AACAGGCAGC ACAACAACAG GCCGTAACCG CTTCCGCCGA AGCCGCGCTC 

151 GCGCCCGCAA CGCCGATTAC CGTAACGACC GACACGGTTC AAGCCGTCAT 

201 TGATGAAAAA AGCGGCGACC TGCGCCGGCT GACCCTGCTC AAATACAAAG 

251 CAACCGGCGA CGAAAATAAA CCGTTCATCC TGTTTGGCGA CGGCAAAGAA 

301 TACACCTACG TCGCCCAATC CGAACTTTTG GACGCGCAGG GCAACAACAT 

351 TCTAAAAGGC ATCGGCTTTA GCGCACCGAA AAAACAGTAC AGCTTGGAAG 

401 GCGACAAAGT TGAAGTCCGC CTGAGCGCGC CTGAAACACG CGGTCTGAAA 

451 ATCGACAAAG TTTATACTTT CACCAAAGGC AGCTATCTGG TCAACGTCCG 

501 CTTCGACATC GCCAACGGCA GCGGTCAAAC CGCCAACCTG AGCGCGGACT 

551 ACCGCATCGT CCGCGACCAC AGCGAACCCG AGGGTCAAGG TTACTTTACC 

601 CACTCTTACG TCGGCCCTGT TGTTTATACC CCTGAAGGCA ACTTCCAAAA 

651 AGTCAGCTTT TCCGACTTGG ACGACGATGC CAAATCCGGC AAATCCGAGG 

701 CCGAATACAT CCGCAAAACC CCGACCGGCT GGCTCGGCAT GATTGAACAC 

751 CACTTCATGT CCACCTGGAT TCTCCAACCT AAAGGCAGAC AAAGCGTTTG 

801 CGCCGCAGGC GAGTGCAACA TCGACATCAA ACGCCGCAAC GACAAGCTGT 

851 ACAGCACCAG CGTCAGCGTG CCTTTAGCCG CCATCCAAAA CGGCGCGAAA 

901 GCCGAAGCCT CCATCAACCT CTACGCCGGC CCGCAGACCA CATCCGTCAT 

951 CGCAAACATC GCCGACAACC TGCAACTGGC CAAAGACTAC GGCAAAGTAC 
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1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 



ACTGGTTCGC 
ATCGGCAACT 
CGTACTGTAT 
GTGCCGCCGC 
CGTATGGCGC 
CAACCCGCTG 
TCGGATTGTA 
TGGCTGGGTT 
GCCCATCATT 
CGCCGACCGA 
TTCTCCGTCA 
AGTCAACAAC 
TCGAAAAACA 



CTCCCCGCTC 
GGGGCTGGGC 
CCATTGACCA 
ACCCAAACTG 
AACAACAGGC 
GGCGGCTGCC 
TTGGGCATTG 
GGATTACCGA 
ATGGCGGCAA 
CCCGATGCAG 
TGTTCTTCTT 
CTCCTGACCA 
ACGCGCCCAA 



TTCTGGCTCC 
GATTATCGTT 
ACGCCTCTTA 
CAAGCCATCA 
GATGATGCAG 
TGCCTATGCT 
TTCGCCTCCG 
CCTCAGCCGC 
CGATGTTCGC 
GCGAAAATGA 
CTTCCCTGCC 
TCGCCCAGCA 
GGCGAAGTCG 



TGAACCAACT 
TTAACCATCA 
CCGCTCTATG 
AAGAGAAATA 
CTTTACACAG 
GTTGCAAATC 
TAGAATTGCG 
GCCGACCCCT 
CCAAACTTAT 
TGAAAATCAT 
GGTCTGGTAT 
ATGGCACATC 
TTTCCTAA 



GCACAACATC 
TCGTCAAAGC 
GCGAAAATGC 
CGGCGACGAC 
ACGAGAAAAT 
CCCGTCTTCA 
CCAGGCACCT 
ACTACATCCT 
CTGAACCCGC 
GCCGTTGGTT 
TGTACTGGGT 
AACCGCAGCA 



This corresponds to the amino acid sequence <SEQ ID 52; ORF1 1-1>: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 



MDFKRLTAFF AIALVIMIGW EKMFPTPKPV PAPQQAAQQQ AVTASAEAAL 



APATPITVTT 
YTYVAQSELL 
IDKVYTFTKG 
HSYVGPWYT 
HFMSTWILQP 
AEASINLYAG 
IGNWGWAIIV 



DTVQAVIDEK 
DAQGNNILKG 
SYLVNVRFDI 
PEGNFQKVSF 
KGRQSVCAAG 
PQTTSVIANI 
LTIIVKAVLY 



RMAQQQAMMQ 
WLGWITDLSR 
FSVMFFFFPA 



LYTDEKINPL 
ADPYYILPII 
GLVLYWWNN 



SGDLRRLTLL 
IGFSAPKKQY 
ANGSGQTANL 
SDLDDDAKSG 
ECNIDIKRRN 
ADNLQLAKDY 
PLTNASYRSM 
GGCLPMLLQI 



KYKATGDENK 
SLEGDKVEVR 
SADYRIVRDH 
KSEAEYIRKT 
DKLYSTSVSV 
GKVHWFASPL 
AKMRAAAPKL 
PVFIGLYWAL 



MAATMFAQTY 
LLTIAQQWHI 



LNPPPTDPMQ 
NRSIEKQRAQ 



PFILFGDGKE 
LSAPETRGLK 
SEPEGQGYFT 
PTGWLGMIEH 
PLAAIQNGAK 
FWLLNQLHNI 
QAIKEKYGDD 
FASVELRQAP 
AKMMKIMPLV 
GEWS* 



Computer analysis of this amino acid sequence gave the following results: 

Homology with a 60kDa inner-membrane protein (accession P25 754^ of Pseudomonas yutida 

ORF1 1 and the 60kDa protein show 58% aa identity in 229 aa overlap (BLASTp). 

ORFll 2 LYAGPQTTSVIANIADNLQLAKDYGKVHWFASPLFWLLNQLHNIIGNWGWAIIVLTIIVK 61 

LYAGP+ S + ++ L+L DYG + + A P+FWLL +H+++GNWGW+IIVLT+++K 
60K 324 LYAGPKIQSKLKELSPGLELTVDYGFLWFIAQPIFWLLQHIHSLLGNWGWSIIVLTMLIK 383 

ORF11 62 AVLYPLTNASYRSMAKMRAAAPKLQAIKEKYGDDRXXXXXXXXXLYTDEKINPLGGCLPM 121 
+ +PL+ ASYRSMA+MRA APKL A+KE++GDDR LY EKINPLGGCLP+ 

384 GLFFPLSAASYRSMARMRAVAPKLAALKERFGDDRQKMSQAMMELYKKEKINPLGGCLPI 443 

181 



60K 
ORF11 
60K 
ORF11 
60K 



122 LLQI PVFIGL YWALFASVELRQAPWLGWIT DLSRADPYY I LP I IMAATMFAQT YLN P P PT 

L+Q+PVF+ LYW L SVE+RQAPW+ WITDLS DP++ILPIIM ATMF Q LNP P 
444 LVQMPVFLALYWVLLESVEMRQAPW ILWI T DL SI KDPFFI LP I IMGATMFIQQRLNPTP P 503 

182 DPMQAKMMKIMPLVXXXXXXXXPAGXVLYWWNNLLTI AQQWHINRS IE 230 

DPMQAK+MK+MP++ PAG VLYWWNN L+I+QQW+I R IE 

504 DPMQAKVMKMMPIIFTFFFLWFPAGLVLYWVVNNCLSISQQWYITRRIE 552 



Homologv with a predicted ORF from N.meninzit idis (strain A) 

ORF11 shows 97.9% identity over a 240aa overlap with an ORF (ORFlla) from strain A of N. 



meningitidis: 



or f 11. pep 
orflla 

orfll.pep 
orflla 



10 20 30 

NLYAGPQTTSVIANIADNLQLAKDYGKVHW 
I I I II MM II I I I M I 11 I M M M I I 
IKRRNDKLYSTSVSVPLAAIQNGAKSXASINLYAGPQTTSVIANIADNLQLXKDYGKVHW 
280 290 300 310 320 330 

40 50 60 70 80 90 

FAS PL FWLLNQLHNI IGNWGWAI I VLT I IVKAVLYPLTNAS YRSMAKMRAAAPKLQAI KE 
Ml III | IMM Ml MM1MM1I MINIM MMM MMM Ml M MINIMI 
FAS PL FWLLNQLHNI IGNWGWAI IVLT I IVKAVLYPLTNAS YRSMAKMRAAAPKLQAIKE 
340 350 360 370 380 390 



WO 99/24578 



-87- 



PCT/IB98/01665 



100 110 120 130 140 150 

orfii.pep kygddrmaqqqammqlytdekinpix;gclpmllqipvfiglywalfasvei^qapwlgwi 
1 1 1 il I I ! 1 1 1 M 1 1 1 Ml II I II I It i Mi III 1 1 I M 1 1 1 1 II 1 1 I 1 1 1 1 1 II I II II 
orflla KYGDDRMAQQQAMMQLYTDEKINPLGGCLPMLLQIPVFIGLYWALFASVELRQAPWLGWI 
400 410 420 430 440 450 

160 170 180 190 200 210 

or f 1 1 . pep TDLSRADPYYILPIIMAATMFAQTYLNPPPTDPMQAKMMKIMPLVFSXXFFFFPAGXVLY 

I I I I I I I I I I I I 1 I I I II I I I I I I I I I I I I I I I I I I I II I I I I I I MINIMI Mi 
orflla TDLSRADPYYILPIIMAATMFAQTYLNPPPTDPMQAKMMKIMPLVXSXXFFXFPAGLVLY 

460 470 480 490 500 510 

220 230 240 

orf 11 .pep WWNNLLTIAQQWHI NRS IEKQRAQGE WSX 

II MM II Mill IIMIIIIIIM1II III 
orflla , WVINNLLT I AQQWHINRS IEKQRAQGE WSX 

520 530 540 

The complete length ORFlla nucleotide sequence <SEQ ID 53> is: 



1 


ANGGATTTTA 


51 


GATCGGATNG 


101 


AACAGACGGC 


151 


GCGCCCGNAN 


201 


TGATGAAAAA 


251 


CAACCGGCGA 


301 


TACACCTACN 


351 


TCTAAAAGGC 


401 


GCGACAAAGT 


451 


ATCGACAAAG 


501 


CTTCGACATC 


551 


ACCGCATCGT 


601 


CACTCTTACG 


651 


AGTCAGCTTC 


701 


CCGAATACAT 


751 


CACTTCATGT 


801 


CGCCGCTGGC 


851 


ACAGCACCAG 


901 


TCCNAAGCCT 


951 


CGCAAACATC 


1001 


ACTGGTTCGC 


1051 


ATCGGCAACT 


1101 


CGTACTGTAT 


1151 


GTGCCGCCGC 


1201 


CGTATGGCGC 


1251 


CAACCCGCTG 


1301 


TCGGATTGTA 


1351 


TGGCTGGGTT 


1401 


GCCCATCATT 


1451 


CGCCGACCGA 


1501 


NTNTCNNNNA 


1551 


GATCAACAAC 


1601 


TCGAAAAACA 



AAAGACTCAC 
NAAANGATGT 
ACAACAACAG 
CGCCGATTAC 
AGCGGCGACC 
CNAAAATAAA 
TCGCCCANTC 
ATCGGCTTTA 
TGAAGTCCGC 
TTTATACTTT 
GCCAACGGCA 
CCGCGACCAC 
TCGGCCCTGT 
TCCGACTTGG 
CCGCAAAACC 
CCACCTGGAT 
GACTGCNGTA 
CGTCAGCGTG 
CCATCAACCT 
GCCGACAACC 
CTCCCCCCTC 
GGGGCTGGGC 
CCATTGACCA 
GCCCAAACTG 
AGCAACAAGC 
GGCGGCTGCC 
TTGGGCATTG 
GGATTACCGA 
ATGGCGGCAA 
CCCGATGCAG 
NGTTCTTCNN 
CTCCTGACCA 
ACGCGCCCAA 



NGNGTTTTTC 
TCCCCACTCC 
GCCGTAANCG 
CGTAACGACC 
TGCGCCGGCT 
CCGTTCATCC 
CGAACTTTTG 
GCGCACCGAA 
CTGAGCGCAC 
CACCAAAGGC 
GCGGTCAAAC 
AGCGAACCCG 
TGTTTATACC 
ACGACGATGC 
CNGACCGGCT 
CCTCCAACCC 
TNGACATCAA 
CCTTTAGCCG 
CTACGCCGGC 
TGCAACTGGN 
TTTTGGCTTT 
GATTATCGTT 
ACGCCTCTTA 
CAAGCCATCA 
CATGATGCAG 
TGCCTATGCT 
TTCGCCTCCG 
CCTCAGCCGC 
CGATGTTCGC 
GCGAAAATGA 
CTTCCCTGCC 
TCGCCCAGCA 
GGCGAAGTCG 



GCCATCGCAC 
GAAGCCCGTC 
CTTCCGCCGA 
GACACGGTTC 
GACCCTGCTC 
TGTTTGGCGA 
GACGCGCAGG 
AAAACAGTAC 
CTGAAACACG 
AGCTATCTGG 
CGCCAACCTG 
AGGGTCAAGG 
CCTGAAGGCA 
CAANTCCGGN 
GGCTCGGCAT 
AAAGGCGGAC 
ACGCCGCAAC 
CTATCCAAAA 
CCACAGACCA 
CAAAGACTAC 
TGAACCAACT 
TTAACCATCA 
CCGTTCGATG 
AAGAGAAATA 
CTTTACACAG 
GTTGCAAATC 
TAGAATTGCG 
GCCGACCCNT 
CCAAACCTAT 
TGAAAATCAT 
GGTCTGGTAT 
ATGGCACATC 
TTTCCTAA 



TGGTGATTAT 
CCCGCGCCCC 
AGCCGCGCTC 
AAGCCGTCAT 
AAATACAAAG 
CGGCAAANAA 
GCAACAACAT 
AGCTTGGAAG 
CGGTCTGAAA 
TCAACGTCCG 
AGCGCGGACT 
CTACTTTACC 
ACTTCCAAAA 
AAATCCGAGG 
GATTGAACAC 
AAAGCGTTTG 
GACAAGCTGT 
CGGTGCGAAA 
CATCNGTTAT 
GGCAAAGTAC 
GCACAACATC 
TCGTCAAAGC 
GCGAAAATGC 
CGGCGACGAC 
ACGAGAAAAT 
CCCGTCTTCA 
CCAGGCACCT 
ACTACATCCT 
CTGAACCCGC 
GCCTTTGGTT 
TGTACTGGGT 
AACCGCAGCA 



This encodes a protein having amino acid sequence <SEQ ID 54>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 



XDFKRLTXFF AIALVIMIGX XXMFPTPKPV PAPQQTAQQQ AVXASAEAAL 



APXXPITVTT 
YTYXAXSELL 
IDKVYTFTKG 
HSYVGPWYT 
HFMSTWILQP 
SXASINLYAG 
IGNWGWAIIV 



DTVQAVIDEK 
DAQGNNILKG 
SYLVNVRFDI 
PEGNFQKVSF 
KGGQSVCAAG 
PQTTSVIANI 
LTIIVKAVLY 



RMAQQQAMMQ 
WLGWITDLSR 
XSXXFFXFPA 



LYTDEKINPL 
ADPYYILPII 
GLVLYWVINN 



SGDLRRLTLL 
IGFSAPKKQY 
ANGSGQTANL 
SDLDDDAXSG 
DCXXDIKRRN 
ADNLQLXKDY 
PLTN ASYRSM 
GGCL PMLLQI 



KYKATGDXNK 
SLEGDKVEVR 
SADYRIVRDH 
KSEAEYIRKT 
DKLYSTSVSV 
GKVHWFASPL 
AKMRAAAPKL 
PVFIGLYWAL 



MAATMFAQTY 
LLTIAQQWHI 



LNPPPTDPMQ 
NRSIEKQRAQ 



PFILFGDGKX 
LSAPETRGLK 
SEPEGQGYFT 
XTGWLGMIEH 
PLAAIQNGAK 
FWLLNQLHNI 
QAIKEKYGDD 
FASVELRQAP 
AKMMKIMPLV 
GEWS* 



ORF1 la and ORF1 1-1 show 95.2% identity in 544 aa overlap: 



10 



20 



30 



40 



50 



60 
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orfll-1 

orflla.pep 
orfll-1 

orflla.pep 
orfll-1 

orflla.pep 
orfll-1 

orflla.pep 
orfll-1 

orflla.pep 
orfll-1 

orflla.pep 
orfll-1 

orflla.pep 
orfll-1 

orflla.pep 
orfll-1 
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XDFKRLTXFFAIAIiVIMIGXXXMFPTPKPVPAPQQTAQQQAVXASAEAALAPXXPITVTT 

i mm I Mill HIM I Mill 11111111:11 Ml 1:111111111 HI MM 
MD^TAFFAI^VIMIGWEKMFPTPKPVPAPQQM 



10 



20 



30 



40 



110 



60 



120 



70 80 90 100 

DTVQAVIDEKSGDLRRLTLLK^ 

i m i M M II M I M I I 1 I I I M II M I MM Mill III I II MINIMUM 

80 90 100 HO 120 



70 



170 



180 



130 140 150 160 

IGFSAPKKQYSLEGDKVEVRLSAPETRGLKIDKVYTFTK^ 
I I I I I || 1 1 I II II II M I M M M I I M II I M II II M II I I I M I M II I ' ' ' 

UUap^qyslegd^ 

140 160 170 loO 



150 



230 



240 



190 200 210 220 

SADYRIVRDHSEPEGQGYFTHSYVGPVVYTPEGNFQKVSFSDLDD^ 

i ii m M II II M I I I II II I II I I M II M I II II I 1 II M I M II II M M II II 
SADTOITODHSEPEGQOT 

190 200 210 220 230 240 



290 



300 



250 260 270 280 

XTGWLGMIEHHFMSTWILQPKGGQSVCAAGDCXXDIKRR^ 
n m i m M M I I I II II I M 1111111:1 M I I M II M M II M II II M M M 

360 



250 



260 



270 



280 



350 



310 320 330 340 

SXAS INLYAGPQTTSVIANI ADNIiQIXKDYGKVHWFASPL^L^QL^ I IQjIWGWAI IV 
• II I I I! 1 1 M I M M I M II M II M II I I II M M M M M II II MM MM II I 
AEAS INLYAGPQTTSVIAN IADNLQLAKDYGKVHWFAS PLFWLLNQLHNIIGNWGWAI IV 
310 320 330 340 



350 



360 



410 



420 



370 380 390 400 

LTIIVKAVLYPLTNASYRSMAKMRAAAPKLQAIKEKYGDDRMA 

i i i M I M M 1 1 II I M M M M I M I M I I M I M M M M M 1 M II M It M 11 M I 

LTIIV^VLYPLTNASYRSMAKMRAAAPKLQ 

~" 380 390 400 410 420 



370 



470 



480 



430 440 450 460 

GGCLPMLLQI PVFIGL YWALFASVELRQAPWLGW IT DLSRAD PYY *^F1 1^^™^^* 
Til ii M I 111 I M II II I 11 1 II II II II I II II M I M M M II II I II I M 1 1 M II 

440 450 460 470 4bU 



430 



530 



540 



490 500 510 520 

LN PPPT D PMQAKMMKIMPLVXSXXFFXFPAGLVLYWVINNLLT I AQQWH INRS ^ ?????? 
i I I I i M II II I II II II I I I M II II II II M : II M II I M II I M M M II II 
LNPPPTDPMQAKM^ 

490 500 510 520 530 540 



530 



orflla.pep 
orfll-1 



GEWSX 
I 11 Ml 
GEWSX 



ffnmologv with a predicted ORF from N. gonorrhoeae 

ORF11 shows 96.3% identity over a 240aa overlap with a predicted ORF (ORFlLng) from N. 



gonorrhoeae: 

Orfll 
orfllng 



NLYAGPQTTSVIANIADNLQIA^^ 

I I III Mil II II M II II 1 M I II II II M II M II II I M I I II II I M II M II 
MAVNLYAGPQTTSVIANIM PLFWLLNQLHNI IGNWGWAIWLT 



57 
60 
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orfll 

orfllng 

orfll 

orfllng 

orfll 

orfllng 



1 1 VKAVLYPLTN AS YRSMAKMRAAAPBaQAIKEKYGDDRMAQQQAMMQLYTDEKIN PLGG 117 

I I II I I I I II I I I I I I I I I I I I I I I I : I I : I I I I I I I I I I 1 1 I I I I I I I : I I : I I I I I I 
IIVKAVLYPLTNASYRSMAKMRAAAPELQTIKEKYGDDRMAQQQAMMQLFE 120 

CLPMLLQI PVFIGLYWALFASVELRQAPWLGWI TDLSRADPYYILPI IMAATMFAQTYLN 177 

I I I 1 1 I I I 1 1 1 1 1 1 1 i I 1 1 I 1 1 1 1 1 I M I II I I I I II I i I I I M I I I 1 1 I I I II I I ! It I 
CLPMLLQI PVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPI IMAATMFAQTYLN 180 

PPPTDPMQAKMMKIMPLVFSXXFFFFPAGXVLYWWNNLLTIAQQWHINRSIEKQRAQGE 237 
I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I I | I I I | | I | | | I | 

PPPTDPMQAKMMKIMPLVFSVMFFFFPAGLVLYWWNNLLTIAQQWHINRSIEKQRAQGE 240 



WS 240 
1 I I 

WS 243 



orfll 
orfllng 

An ORF1 Ing nucleotide sequence <SEQ ID 55> was predicted to encode a protein having amino 
acid sequence <SEQ ID 56>: 



l 

51 
101 
151 
201 



MAVNLYAGPQ TTSVIANIAD NLQLAKDYGK VHWFASPLFW 
NWGW AIWLT IIVKAVLYPL TN ASYRSMAK MRAAAPELQT 
AQQQAMMQLF EDEEINPLGG CLP MLLQIPV FIGL YWALFA 



GWITDLSRAD PYYILPIIMA ATMFAQTYLN PPPTDPMQAK 
VMFFFFPAGL VLYW WNNLL TIAQQWHINR SIEKQRAQGE 



LLNQLHNIIG 
IKEKYGDDRM 
SVELRQAPWL 
MMKIM PLVFS 
WS* 



Further sequence analysis revealed the complete gonococcal DNA sequence <SEQ ID 57> to be: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 



ATGGATTTTA 
GATCGGCTGG 
AACAGGCGGC 
GCGCCCGCAA 
TGATGAAAAA 
CAACCGGCGA 
TACACCTACG 
TCTGAAAGGC 
GCGACACAGT 
ATCGACAAAG 
CTTCGACATC 
ACCGCATCGT 
CACTCTTACG 
AGTCAGCTTC 
ccgaatacaT 
cacttcatgt 
cgcccaggga 
acagcgcaag 
aaaccgaaaa 
TATCGCAAAC 
TACACTGGTT 
ATTATCGGCA 
AGCCGTACTG 
TGCGTGccgc 
GACCGTATGG 
AATCAACCCG 
TCATCGGCTT 
CCTTGGCTGG 
CCTGCCCATC 
CGCCGCCGAC 
GTTTTCTCCG 
GGTGGTCAAC 
GCATCGAAAA 



AAAGACTCAC 
GAAAAAATGT 
ACAAAAACAG 
CGCCGATTAC 
AGTGGCGACC 
CGAAAACAAA 
TCGCCCAATC 
ATCGGCTTTA 
CGAAGTCCGC 
TCTATACCTT 
GCCAACGGCA 
CCGCGACCAC 
TCGGCCCTGT 
TCCgacTTgg 
CCGCAAAACC 
ccacctggat 
gactgccgta 
cgtcagcgtg 
tggcggTCAA 
ATCGCcgacA 
CGCATCGCCG 
ACTGGGGCTG 
TATCCATTGA 
cgcacCcaaA 
CGCAACAGCA 
CTGGGCGGCT 
GTACTGGGCA 
GCTGGATTAC 
ATTATGGCGG 
CGACCCGATG 
TCATGTTCTT 
AACCTCCTGA 
ACAACGCGCC 



GGCGTTTTTC 
TCCCCACCCC 
GGAGGAACCG 
CGTAACGACC 
TGCGCCGGCT 
CCGTTCGTCC 
CGAACTTTTG 
GCGCACCGAA 
CTGAGCGCGC 
TACCAAAGAC 
GCGGTCAAAC 
AGCGAACCCG 
TGTTTATACC 
acgACGATGC 
ccgaccggtt 
cctccAAcct 
tcgacattaa 
cctttaaccg 
CCTGTATGCC 
ACCTGCAACT 
CTCTTCTGGC 
GGCAATCGTC 
CCAACGcctC 
CTGCAGACCA 
AGCGATGATG 
GTctgcctat 
TTGTTCGCCT 
CGACCTCAGC 
CAACGATGTT 
CAGGCGAAAA 
CTTCTTCCCT 
CCATCGCCCA 
CAAGGCGAAG 



GCCATCGCGC 
GAAACCCGTC 
CTTCCGCCGA 
GACACGGTTC 
GACCCTGCTC 
TGTTTGGCGA 
GACGCGCAGG 
AAAACAGTAC 
CCGAAACCAA 
AGCTATCTGG 
CGCCAACCTG 
AGGGTCAAGG 
CCTGAAGGCA 
gaaaTccggc 
ggctcggcat 
aaaggcggcc 
aCgccgcaac 
ctatcccaac 
GGTCCGCAAA 
GGCAAAAGAC 
TCCTGAACCA 
GTTTTGACCA 
CtACCGTTCG 
TCAAAGAAAA 
CAGCTTTACA 
gctgttgCAA 
CCGTAGAATT 
CGCGCCGACC 
CGCCCAAACC 
TGATGAAAAT 
GCCGGTTTGG 
GCAGTGGCAC 
TCGTTTCCTA 



TGGTGATTAT 
CCCGCGCCCC 
AGCCGCGCTC 
AAGCCGTTAT 
AAATACAAAG 
CGGCAAAGAA 
GCAACAACAT 
ACCCTCAACG 
CGGACTGAAA 
TCAACGTCCG 
AGCGCGGACT 
CTACTTTACC 
ACTTCCAAAA 
aaATccgagg 
gattgaacac 
aaaacgtttg 
gacaagctgt 
ccgggggcca 
CCACATCCGT 
TACGGTAAAG 
ACTGCACAAC 
TCATCGTCAA 
ATGGCGAAAA 
ATAcgGCGAC 
AAgacgAGAA 
ATCCCCGTCT 
GCGCCAGGCA 
CCTACTACAT 
TATCTGAACC 
CATGCCGTTG 
TTCTCTACTG 
ATCAACCGCA 
A 



This encodes a protein having amino acid sequence <SEQ ID 58; ORF1 lng-l>: 

1 MDFKRLTAFF AIALVIMIGW EKMFPTPKPV PAPQQAAQKQ AATASAEAAL 

51 APATPITVTT DTVQAVIDEK SGDLRRLTLL KYKATGDENK PFVLFGDGKE 

101 YTYVAQSELL DAQGNNILKG IGFSAPKKQY TLNGDTVEVR LSAPETNGLK 

151 IDKVYTFTKD SYLVNVRFDI ANGSGQTANL SADYRIVRDH SEPEGQGYFT 

201 HSYVGPWYT PEGNFQKVSF SDLDDDAKSG KSEAEYIRKT PTGWLGMIEH 

251 HFMSTWILQP KGGQNVCAQG DCRIDIKRRN DKLYSASVSV PLTAIPTRGP 

301 KPKWAVNLYA GPQTTSVIAN IADNLQLAKD YGKVHWFASP LFWLLNQLHN 
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351 IIGNWGW AIV VLTIIVKAVL YPLTN ASYRS MAKMRAAAPK LQTIKEKYGD 
401 r>PMnrxx?&MM OT.YKDEKINP LGGCLP MLLQ IPVFIGLYWA LFA SVELRQA 
451 PWLGWITDLS RADPYYILPI IMAATMFAQT YLNPPPTDPM QAKMMKIMPL 
501 VFSVMFFFFP AGLVLYW WN NLLTIAQQWH INRSIEKQRA QGEWS* 

ORF1 lng-1 and ORF1 1-1 shown 95.1% identity in 546 aa overlap: 

10 20 30 40 50 60 

orfllna-1 Pep MDFKRLTAFFAIALVIMIGWEKMFPTPKPVPAPQQAAQKQAATASAEAALAPATPITVTT 
ore ung p p ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, m : m s ,, | 1 1 1 M | , 1 1 1 1 i 1 1 1 
r>rfl 1-1 MDFKRLTAFFAI ALVIMI GWEKMFPTPKPVPAPQQAAQQQAVTASAEAALAPAT P I TVTT 

° r io 20 30 40 50 60 

70 80 90 100 110 120 

or f 1 lna- 1 pep DTVQAVIDEKSGDLRRLTLLKYKATGDENKPFVLFGDGKEYTYVAQSELLDAQGNNILKG 

o ng * P P Ml || || I MM I I III 1 1 I I II : I I I I I I [ I I I I I I I I I I I I I I I I I I H 

^r-fll-1 DTVQAVIDEKSGDLRRLTLLKYKATGDENKPFILFGDGKEYTYVAQSELLDAQGNNILKG 
° ri1 70 80 90 100 110 120 

130 140 150 160 170 180 

«r f 1 lna-1 Pep IGFSAPKKQYTLNGDTVEVRLSAPETNGLKIDKVYTFTKDSYLVNVRFDIANGSGQTANL 

° 9 ' P |||lllllll:l:ll III M HUllilllll II III Ml Mil tllll II I 

^rfll-1 IGFSAPKKQYSLEGDKVEVRLSAPETRGLKIDKVYTFTKGSYLVNVRFDIANGSGQTANL 
° rlX1 130 140 150 160 170 180 



orfl lng-1. pep 
orfll-1 



190 200 210 220 230 240 

SADYRIVRDHSEPEGQGYFTHSYVGPWYTPEGNFQBCVSFSDLDDDAKSGKSEAEYIRKT 

MIMMIIIMIMIMMIMIIMMMIMIIIMMIMIIMI MM 

SADYRIVRDHSEPEGQGYFTHSYVGPWYTPEGNFQKVSFSDLDDDAKSGKSEAEYIRKT 

190 200 210 220 230 240 



250 260 270 280 290 300 

orfllna-1. pep PTGWLGMIEHHFMSTWILQPKGGQNVCAQGDCRIDIKRRNDKLYSASVSVPLTAIPTRGP 

IMMMIMMMIMMMI I: Ml Ml I 1 1 I I ! I I I I I I : I I I I I I : M : J 
orfll-1 PTGWLGMIEHHFMSTWILQPKGRQSVCAAGECNIDIKRRNDKLYSTSVSVPLAAIQN-GA 

250 260 270 280 290 



orfllng-l.pep 
orfll-1 



310 320 330 340 350 360 

KPKMAVNLYAGPQTTSVIANIADNLQLAKDYGJCVHWFASPLFWLLNQLHNIIGNWGWAIV 

I : r r K 1 M I K I 1 I K I I 1 M I t I ! I I 1 1 I I M 1 M 1 I I I I I I I I M 1 I I 1 1 I I I I 1 1 i 1 

KAEAS INLYAGPQTTSVI AN I ADNLQLAKDYGKVHWFAS PL FWLLNQLHN I IGNWGWAI I 

300 310 320 330 340 350 

370 380 390 400 410 420 

orf 1 lna-1 Pep VLTIIVKAVLYPLTNASYRSMAKMRAAAPKLQTIKEKYGDDRMAQQQAMMQLYKDEKINP 
o a P P I M II M I M II M I I II M II II 11 II II M : II II II II II II II II II II MUM 
orfll-1 VLTIIVKAVLYPLTNASYRSMAKMRAAAPKLQAIBCEKYGDDRMAQQQAMMQLYTDEKINP 
360 370 380 390 400 410 

430 440 450 460 470 480 

orfllna-1. pep LGGCLPMLLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPI IMAATMFAQT 

orf ling p p ,,,,,,,, ,,,,,,,,,,,,,,,,,,,, , I M II M II II II M M II ! M I 

orfll-1 LGGCLPMLLQIPVFIGLYWALFASVELRQAPWLGW IT DLSRADPYYILPI IMAATMFAQT 

420 430 440 450 460 470 

490 500 510 520 530 540 

orf Una- 1 . pep YLNPPPTDPMQAKMMKIMPLVFSVMFFFFPAGLVLYWVVNNLLTIAQQWHINRSIEKQRA 
9 P P , M II M I II II M II I M I M II II II II II II M I I II II II M II I II II M I II II 
orfll-1 YLN P PPTDPMQAKMMKIMPLVFS VMFFFFPAGLVLYWWNNLLT I AQQWHINRS I EKQRA 

480 490 500 510 520 530 

or f 1 lng- 1 . pep QGEWSX 
I I I I I I I 

orfll-1 QGEWSX 
540 

In addition, ORF1 lng-1 shows significant homology with an inner-membrane protein from the 
database (accession number p25754): 
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ID 60IM_PSEPU STANDARD; PRT; 560 AA. 

AC P25754; 

DT 01-MAY-1992 (REL. 22, CREATED) 

DT 01-MAY-1992 (REL. 22, LAST SEQUENCE UPDATE) 

DT 01-NOV-1995 (REL. 32, LAST ANNOTATION UPDATE) 

DE 60 KD INNER-MEMBRANE PROTEIN. . . . 



10 



15 



20 



SCORES Initl: 1074 Initn: 1293 Opt: 1103 

Smith-Waterman score: 1406; 41.5% identity in 574 aa overlap 



orfllng-l.pep 
p25754 



orfllng-l.pep 



p25754 



10 20 30 40 

MDFKR LTAFFAIALVIMIGW EKMFPT r PKPVPAPQQAAQKQ 

11:11 : : I : : : : I : : : | : : I I I I | | : : : I : : 

MDIKRTILIAALAWSYVMVLKWNDDYGQAALPTQNTAASTVAPGLPDGVPAGNNGASAD 
10 20 30 40 50 60 

50 60 . 70 80 90 

AAT AS AE AALAPAT PI T VTTDTVQAVIDEKSGDLRRLTLLKYKATGDE-NKPF 

: : I : I I : : I : I : : I I I :: : : I I : I I : : I : I I I I : I I I 

VPSANAESSPAELAPVALSKDLIRVKTDVLELAIDPVGGDIVQLNLPKYPRRQDHPNIPF 

70 80 90 100 110 120 



25 



orfllng-l.pep 
p25754 



100 110 120 130 140 

VL FGDGKE YT YVAQSELLDAQGNN I LKGIG FSAPKKQYTL-NGD TVEVRLSAPE 

II :l I :|:IM I ::i : : : I ::| :|:| I :|: :|::::| 

QLFDNGGERVYLAQSGLTGTDGPDA-RASGRPLYAAEQKSYQLADGQEQLWDLKFS 

130 140 150 160 170 



30 



35 



40 



45 



orfllng-l.pep 
p25754 



orfllng-l.pep 



p25754 



orfllng-l.pep 
p25754 



150 160 170 180 190 200 

TNGLKIDKVYTFTKDSYLVNVRFDIANGSGQTANLSADYRIVRDHS-EPEGQGYF-THSY 

Ms: I ::| : I : I I : I I I I | : | : :: II I :| :: I :| 
DNGVNY IKRFS FKRGE YDLNVS YLI DNQSGQAWNGNMFAQLKRDASGDPS S STATGTAT Y 

180 190 200 210 220 230 

210 220 230 240 250 260 

VGPWYTPEGNFQKVSFSDLDDDAKSGKSEAEYIRKTPTGWLGMIEHHFMSTWILQPKGG 
: I : : : I : : I I I : : I : I I : : : I : : I I : : : : | : | : : : | | | : 

LGAALWTASE P YKKVSMKD I D KGSLKE NV SGGWVAWLQHY FVTAWI - PAKS D 

240 250 260 270 280 

270 280 290 300 310 320 

QNVCAQGDCRIDIKRRNDKLYSASVSVPLTAIPTRGPKPKMAVNLYAGPQTTSVIANIAD 
: I I :::::: | : : I : : : I : I I : : : | I I I | : | : : : : 

NNV VQTRKDSQGNYI IGYTGPVI SVPA-GGKVETSALLYAGPKIQSKLKELS P 

290 300 310 320 330 



50 



330 340 350 360 370 380 

orfllng-l.pep NLQLAKDYGKVHWF-ASPLFWLLNQLHNI IGNWGWAI WLTI IVKAVLYPLTNASYRSMA 
: I : I : III : I I I : I : I I M : : : I : : : I I I I I : I : I II : : : I : : : : I I : I I I I I I I 
p25754 GLELTVDYGFL-WFIAQPIFWLLQHIHSLLGNWGWSIIVLTMLIKGLFFPLSAASYRSMA 
340 350 360 370 380 390 



55 



390 400, 410 420 430 440 

orf llng-1 . pep KMRAAAPKLQTIKEKYGDDRMAQQQAMMQLYKDEKINPLGGCLPMLLQIPVFIGLYWALF 
: II I : I i 1 1 ::M::||||: ::IM|:IM I I II I I 1 1 I II : I : I : I I I : : II I : I : 
p25754 RMRAVAPKLAALKERFGDDRQKMSQAMMELYKKEKINPLGGCLPILVQMPVFLALYWVLL 
400 410 420 430 440 450 



60 



65 



450 460 470 480 490 500 

orf llng-1 . pep ASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQTYLNPPPTDPMQAKMMKIMPLVF 
111:11111: IMMI II :: I I I I I I : I I I I I III I I I I I I I : I I : II : : I 
p25754 ESV^IMRQAPWILWITDLSIKDPFFILPIIMGATMFIQQRLNPTPPDPMQAKVMKMMPIIF 
460 470 480 490 500 510 

510 520 530 540 

orf llng-1 . pep SVMFFFFPAGLVLYWWNNLLTIAQQWHINRSIEKQRAQGEWSX 

: :|::||l!lilllllll l:|:lll:|:i II 
p2 5 7 5 4 T FFFLW FPAGLVLYWWNNCLS I SQQWY ITRRI EAATKKAAA 

520 530 540 550 560 
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Based on this analysis, including the homology to an inner-membrane protein from P. putida 
the predicted transmembrane domains (seen in both the meningococcal and gonoccal proteins), it 
is predicted that the proteins from N.meningitidis and N.gonorrhoeae, and their epitopes, conld be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

5 Example 8 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 59>: 

i rfCGTCTTAA TCATCGAATT ATTGACGGGA ACGGTTTATC TTTTGGTTGT 
SI "SS TTGGCGGGTT CGGGCATTGC TTACGGGCTG ACCGGCAGTA 

nm cgStgccgc cgtcttgacc gncgctctgc tttccgcgct GGGTATTTNG 

\ll ScSJScG CCAAAACCGC CGTTAGAAAA GTTGAAACGG ATTCATATCA 

10 \l\ ggatSggat gccggacaat atgtcgaaat cctccgncac ACAGGCGGCA 

IV; ACCGTTACGA AGTT . TTTAT CGCGGTACG. ACTGGCAGGC TCAAAATACG 

H\ gggSIS AGCTTGAACC aggaactcgc gccctcattg tccgcaagga 

351 AGGCAACCTT CTTATTATCA CACACCCTTA A 

15 This corresponds to the amino acid sequence <SEQ ID 60; ORF13>: 

i jvttt ttkLLTG TVYT.T.WS AA LAGSGIAYGL TGSTPAAVLT XA LLSALGIX 

5 l --fiiljig VETDSYQDLD AGQYVE1LRH TGGNRYEVXY K GTXWQAQNT 
101 GQEELEPGTR ALIVRKEGNL LIITHP* 

Further sequence analysis elaborated the DNA sequence slightly <SEQ ID 61>: j 

1 rcCGTCTTAA TCATCGAATT ATTGACGGGA ACGGTTTATC TTTTGGTTGT 

20 1 ■ TTGGCGGGTT CGGGCATTGC TTACGGGCTG ACCGGCAGTA 

101 ScCTGCCGC CGTCTTGACC GnCGCTCTGC TTTCCGCGCT GGGTATTTnG 
TTCGTACACG CCAAAACCGC CGTTAGAAAA GTTGAAACGG ATTCATATCA 

III GGATTTGGAT GCCGGACAAT ATGTCGAAAT CCTCCGACAC ACAGGCGGCA 

201 ??2^™ A AGTT TTtTAT CGCGGTACGc ACTGGCAGGC TCAAAATACG 

25 "1 ACCGTTACGA AGTT^^ gccctcattg TC CGCAAGGA 

351 AGGCAACCTT CTTATTATCA CACACCCTTA A 

This corresponds to the amino acid sequence <SEQ ID 62; ORF13-l>: 

1 ftVT.T IELLTG TtrvT.T.WS RA LAGSGIAYGL TGSTPAAVLT XALLSALGIX 
~ ft 5 i " KS5£S VETDSYQDLD AGQYVEILRH TGGNRYEV^ K GTHWQAQNT 

SV 10 1 GQEELEPGTR ALIVRKEGNL LIITHP* 

Computer analysis of this amino acid sequence gave the following results: 
jjoniolo gy, with a Eredicted ORF fro m N meningitidis (strain A) 

ORF13 shows 92.9% identity over a 126aa overlap with an ORF (ORF13a) from strain A of AT. 
35 meningitidis: 



40 



orfl3.pep 
orfl3a 



10 20 30 40 

fltrr TT PT.T.TfJTWT.T.WSAALAGSGIA YGLTGSTPAAVLTXALLSALGIXF 

T~i I I i I I I I I I I I I I | | | | | | | I | I I I I I I I I I I I I I I I I I I I I I I I I I 

MTVMyAAVAVL^ 

775 on 30 40 =u DU 



60 70 80 90 100 1X0 

VHAKTAVRKVETDSYQDLDAGQYVEILRHTGGNRYEVXYR^ 

VHMCTAVGKVETDSYQDLDAGQY 

70 80 90 100 HO m> 



or f 13. pep 
45 orfl3a 



120 

orf 13 .pep LIVRKEGNLLIITHPX 
50 I I I I I I I I I I 1 1 : : 1 1 
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orfl3a LIVRKEGNLLIIAKPX 
130 

The complete length ORF13a nucleotide sequence <SEQ ID 63> is: 

1 ATGACTGTAT GGTTTGTTGC CGCTGTTGCC GTCTTAATCA TCGAATTATT 

51 GACGGGAACG GTTTATCTTT TGGTTGTCAG CGCGGCTTTG GCGGGTTCGG 

101 GCATTGCTTA CGGGCTGACC GGCAGCACGC CTGCCGCCGT CTTGACCGCC 

151 GCTCTGCTTT CCGCGCTGGG TATTTGGTTC GTACACGCCA AAACCGCCGT 

201 GGGAAAAGTT GAAACGGATT CATATCAGGA TTTGGATGCC GGGCAATATG 

251 CCGAAATCCT CCGGCACGCA GGCGGCAACC GTTACGAAGT TTTTTATCGC 

301 GGTACGCACT GGCAGGCTCA AAATACGGGG CAAGAAGAGC TTGAACCAGG 

351 AACGCGCGCC CTAATCGTCC GCAAGGAAGG CAACCTTCTT ATCATCGCAA 

401 AACCTTAA 

This encodes a protein having amino acid sequence <SEQ ID 64>: 

1 MTVWFVAAVA VLIIELLTGT VYLLWSAAL AGSGIAYGLT GSTPAAVLTA 
51 ALLSALGIWF VHAKTAVGKV ETDSYQDLDA GQYAEILRHA GGNRYEVFYR 
101 GTHWQAQNTG QEELEPGTRA LIVRKEGNLL IIAKP* 

ORF13a and ORF13-1 show 94.4% identity in 126 aa overlap 

10 20 30 40 50 60 

or f 1 3a . pep MTVWFVAAVAVL I IELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTAALLSALGIWF 

t I I I I I I E f i 1 I I 1 J I I I 1 I I I I I 1 I I I I t I I I I I I t I 1 I III ill I I I 
or f 1 3- 1 AVLI I E LLTGTVYLLWS AALAGSGIAYGLTGST PAAVLTXALLS ALG IXF 

10 20 30 40 50 

70 80 90 100 110 120 

orf 13a . pep VHAKTAVGKVETDSYQDLDAGQYAEILRHAGGNRYEVFYRGTHWQAQNTGQEELEPGTRA 
I I I I I I I I I I I I I II I I I I I I I : I I I I I : I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 
orf 13-1 VHAKTAVRKVETDSYQDLDAGQYVEILRHTGGNRYEVFYRGTHWQAQNTGQEELEPGTRA 

60 70 80 90 100 110 



130 

orf 13a. pep LIVRKEGNLLIIAKPX 
I I I I I I I I I 1 1 I :: I I 
orfl3-l LIVRKEGNLL I ITHPX 

120 



Homology with a predicted ORF from N.gonorrhoeae 

ORF13 shows 89.7% identity over a 126aa overlap with a predicted ORF (ORF13.ng) from N. 
gonorrhoeae: 



orfl3 

orfl3ng 

orfl3 

orfl3ng 

orfl3 

orf!3ng 



AVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTXALLSALGIXF 
I I I I I I i I I I I I I I I I I I I I I I I I I M I 1 I I I I I I II I I I I I I I I 1 I I I 
MTVWFVAAVAVLI IELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTAALLSALGIWF 



51 



60 



111 



VHAKTAVRKVET DS YQDL DAGQYVE I LRHTGGNRYEVXYRGTXWQAQNTGQEELE PGTRA 
IMIII! I I I I I I I I II 1 : I : I : I I I I : I 1 I I M I I II I I I I I 1 I I I I I : I I I I 1 I 
VHAKTAVGKVETDSYQDLDTGKYAEILRYTGGNRYEVFYRGTHWQAQNTGQEVFEPGTRA 120 

LIVRKEGNLLI ITHP 126 

lllllllll lll::l 
LIVRKEGNLLI I AN P 135 



The complete length ORF13ng nucleotide sequence <SEQ ED 65> is: 



1 ATGACTGTAT 

51 GACGGGAACG 

101 GCATTGCCTA 

151 GCACTGCTTT 

201 GGGAAAAGTT 

251 CCGAAATCCT 

301 GGTACGCACT 

351 AACGCGCGCC 

401 ACCCTTAA 



GGTTTGTTGC 
GTTTATCTTT 
CGGGCTGACT 
CCGCGCTGGG 
GAAACGGATT 
CCGATACACA 
GGCAGGCGCA 
CTCATCGTCC 



CGCTGTTGCC 
TGGTTGTCAG 
GGCAGCACGC 
CATTTGGTTC 
CATATCAGGA 
GGCGGCAACC 
AAATACGGGG 
GCAAAGAAGG 



GTCTTAATCA 
CGCGGCTTTG 
CTGCCGCCGT 
GTACATGCCA 
TTTGGATACC 
GTTACGAAGT 
CAGGAAGTGT 
TAACCTTCTT 



TCGAATTATT 
GCGGGTTCGG 
CTTGACCGCC 
AAACCGCCGT 
GGAAAATATG 
TTTTTATCGC 
TTGAACCGGG 
ATCATCGCAA 
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This encodes a protein having amino acid sequence <SEQ ID 66>: 

1 MTVWFVAAVA VT.TT F.t,LTGT VYLLWSAAL AGSGIAYGLT G STPAAVLTA 

5 \ Kl L salgiwf vhaktavgkv etdsy qdldt gkyaeilryt GGNRYEVFYK 

101 GTHWQAQNTG QEVFEPGTRA LIVRKEGNLL IIANP* 

ORF13ng shows 91.3% identity in 126 aa overlap with ORF13-1: 

10 20 30 40 50 

AVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTXALLSALGIXF 

orfl3-l.pe P ?, ,|| llllllllilllllllllllMIIIIMI I I IHIII I 

orfl3ng MTVWFVAAVAVLIIELLTGTVYLLWSAA^GSGIAYGLTGSTPAAVLTAALLSALGIWF 

60 70 80 90 100 110 

orf 13-l.pe P ™™-^^ 

orfl3na ^AVG^TDSYQDLDTGKYAEII^YTGGNRYEVFYRGTHWQAQNTGQEVFEPGT^ 
orzunq 7Q 80 9Q 100 110 120 

120 

orf 13-1. pep LIVRKEGNLLI ITHPX 
° P II III I III II I- I I 

orfl3ng LIVRKEGNLLIIANPX 

130 , 

Based on this analysis, including the extensive leader sequence in this protein, it is predicted that 
ORF13 and ORF13ng are likely to be outer membrane proteins. It is thus predicted that the protein^ 
from N.meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines; 
or diagnostics, or for raising antibodies. 

Example 9 

The following DNA sequence was identified in N.meningitidis <SEQ ID 67>: 

1 ATGTwTGATT TCGGTTTrGG CGArCTGGTT TTTGTCGGCA TTATCGCCCT 

51 GATwGtCCTC GGCCCCGAAC GCsTGCCCGA GGCCGCCCGC AyCGCCGGAC 

101 GGcTCATCGG CAGGCTGCAA CGCTTTGTCG GcAGCGTCAA ACAGGAATTT 

151 GACACTCAAA TCGAACTGGA AGAACTGAGG AAGGCAAAGC AGGAATTTGA 

201 AGCTGCCGcC GCTCAGGTTC GAGACAGCCT CAAAGAAACC GGTACGGATA 

251 TGGAAGGCAA TCTGCACGAC ATTTCCGACG GTCTGAAGCC TTGGGAAAAA 

301 CTGCCCGAAC AGCGGACACC TGCCGATTTC GGTGTCGATG AAAACGGCAA 

351 TCCGCT.TCC CGATGCGGCA AACACCCTAT CAGACGGCAT TTCCGACGTT 

401 ATGCCGTC . . 

This corresponds to the amino acid sequence <SEQ ID 68; ORF2>: 

1 MXDFGLGELV FVGIIALIVL GPERXPEAAR XAGRLIGRLQ RFVGSVKQEF 
51 DTQIELEELR KAKQEFEAAA AQVRDSLKET GTDMEGNLHD ISDGLKPWEK 
101 LPEQRTPADF GVDENGNPXS RCGKHPIRRH FRRYAV. . 

Further work revealed the complete nucleotide sequence <SEQ ID 69>: 

1 ATGTTTGATT TCGGTTTGGG CGAGCTGGTT TTTGTCGGCA TTATCGCCCT 

51 GATTGTCCTC GGCCCCGAAC GCCTGCCCGA GGCCGCCCGC ACCGCCGGAC 

101 GGCTCATCGG CAGGCTGCAA CGCTTTGTCG GCAGCGTCAA ACAGGAATTT 

151 GACACTCAAA TCGAACTGGA AGAACTGAGG AAGGCAAAGC AGGAATTTGA 

201 AGCTGCCGCC GCTCAGGTTC GAGACAGCCT CAAAGAAACC GGTACGGATA 

251 TGGAAGGCAA TCTGCACGAC ATTTCCGACG GTCTGAAGCC TTGGGAAAAA 

-> 301 CTGCCCGAAC AGCGGACACC TGCCGATTTC GGTGTCGATG AAAACGGCAA 

J 351 TCCGCTTCCC GATGCGGCAA ACACCCTATC AGACGGCATT TCCGACGTTA 

401 TGCCGTCCGA ACGTTCCTAC GCTTCCGCCG AAACCCTTGG GGACAGCGGG 

451 CAAACCGGCA GTACAGCCGA ACCCGCGGAA ACCGACCAAG ACCGCGCATG 

501 GCGGGAATAC CTGACTGCTT CTGCCGCCGC ACCCGTCGTA CAGACCGTCG 
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551 AAGTCAGCTA TATCGATACT GCTGTTGAAA CGCCTGTTCC GCACACCACT 
601 TCCCTGCGCA AACAGGCAAT AAGCCGCAAA CGCGATTTTC GTCCGAAACA 
651 CCGCGCCAAA CCTAAATTGC GCGTCCGTAA ATCATAA 

This corresponds to the amino acid sequence <SEQ ID 70; ORF2-l>: 

1 MFD FGLGELV FVGIIALIVL GPERLPEAAR TAGRLIGRLQ RFVGSVKQEF 

51 DTQIELEELR KAKQEFEAAA AQVRDSLKET GTDMEGNLHD ISDGLKPWEK 

101 LPEQRTPADF GVDENGNPLP DAANTLSDGI SDVMPSERSY ASAETLGDSG 

151 QTGSTAEPAE TDQDRAWREY LTASAAAPW QTVEVSYIDT AVETPVPHTT 

201 SLRKQAISRK RDFRPKHRAK PKLRVRKS* 

Further work identified the corresponding gene in strain A of N.meningitidis <SEQ ID 71 >: 

1 ATGTTTGATT TCGGTTTGGG CGAGCTGGTT TTTGTCGGCA TTATCGCCCT 

51 GATTGTCCTC GGCCCCGAAC GCCTGCCCGA GGCCGCCCGC ACCGCCGGAC 

101 GGCTCATCGG CAGGCTGCAA CGCTTTGTCG GCAGCGTCAA ACAGGAATTT 

151 GACACGCAAA TCGAACTGGA AGAACTAAGG AAGGCAAAGC AGGAATTTGA 

201 AGCTGCCGCT GCTCAGGTTC GAGACAGCCT CAAAGAAACC GGTACGGATA 

251 TGGAGGGTAA TCTGCACGAC ATTTCCGACG GTCTGAAGCC TTGGGAAAAA 

301 CTGCCCGAAC AGCGCACGCC TGCTGATTTC GGTGTCGATG AAAACGGCAA 

351 TCCCTTTCCC GATGCGGCAA ACACCCTATT AGACGGCATT TCCGACGTTA 

401 TGCCGTCCGA ACGTTCCTAC GCTTCCGCCG AAACCCTTGG GGACAGCGGG 

451 CAAACCGGCA GTACAGCCGA ACCCGCGGAA ACCGACCAAG ACCGTGCATG . 

501 GCGGGAATAC CTGACTGCTT CTGCCGCCGC ACCCGTCGTA CAGACCGTCG 

551 AAGTCAGCTA TATCGATACC GCTGTTGAAA CCCCTGTTCC GCATACCACT 

601 TCGCTGCGTA AACAGGCAAT AAGCCGCAAA CGCGATTTGC GTCCTAAATC 

651 CCGCGCCAAA CCTAAATTGC GCGTCCGTAA ATCATAA 

This encodes a protein having amino acid sequence <SEQ ID 72; ORF2a>: 

1 MF DFGLGELV FVGIIALIVL GPERLPEAAR TAGRLIGRLQ RFVGSVKQEF 

51 DTQIELEELR KAKQEFEAAA AQVRDSLKET GTDMEGNLHD ISDGLKPWEK 

101 LPEQRTPADF GVDENGNPFP DAANTLLDGI SDVMPSERSY ASAETLGDSG 

151 QTGSTAEPAE TDQDRAWREY LTASAAAPW QTVEVSYIDT AVETPVPHTT 

201 SLRKQAISRK RDLRPKSRAK PKLRVRKS* 

The originally-identified partial strain B sequence (ORF2) shows 97.5% identity over a 118aa 
overlap with ORF2a: 

10 20 30 40 50 60 

or f 2 . pep MX DFGLGELVFVGIIALIVL GPERXPEAARXAGRLIGRLQRFVGSVKQEFDTQIELEELR 

t I 1 I I I I I I I I I I HI I I I I i I I I I I I I : I I I I I I I I I M I I 1 1 I I I i I I I I I I I 1 I I 
or f2a MFD FGLGELVFVGIIALIVL GPERLPEAARTAGRLIGRLQRFVGSVKQEFDTQIELEELR 

10 . 20 30 40 50 60 

70 80 90 100 110 120 

orf 2 .pep KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPXS 

II I I I M I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 2a KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPFP 

70 80 90 i 100 110 120 

130 

orf 2. pep RCGKHPIRRHFRRYAV 

orf 2a DAANTLLDGISDVMPSERSYASAETLGDSGQTGSTAEPAETDQDRAWREYLTASAAAPW 
130 140 150 160 170 180 

The complete strain B sequence (ORF2-1) and ORF2a show 98.2% identity in 228 aa overlap: 

or f 2a . pep MFDFGLGELVFVGIIALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQEFDTQIELEELR 60 

1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 II 1 1 1 1 II 1 1 1 II II II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

orf 2-1 MFDFGLGELVFVGIIALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQEFDTQIELEELR 60 

orf 2a .pep KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPFP 120 

I I I I I II II II I I I I I I I I I I I I I I I II I I I I I I I I I II I I I I I I II I I I I I I I I I I I: I 
orf 2-1 KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPLP 120 

orf 2a. pep DAANT LLDG I S DVMPSERS Y AS AETLG DSGQTG ST AE PAET DQ DRAWRE YLT AS AAAPW 180 

I I || | | | | I I I I I I I I I I I I I I I I I I I I II I I I I M I I I I I I I I II I I II I | I I | I | I I . 
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orf2 -l DAANT LSDGISDVMPSERSYASAETLGDSGQTGSTAEPAETDQDRAWREYLTASAAAPW 180 

229 



^<ro a neo QTVEVSYIDTAVETPVPHTTSLRKQAISRKRDLRPKSRAKPKLRVRKSX 
orf2a.pep , , , ,, , , , , , , , | | || I I I I II I 1 1 I M I I : I I I II III III MM 

orf2-l QTVEVSYI DTAVET PVPHTTSLRKQAI SRKRDFRPKHRAKPKLRVRKSX ^ 

Further work identified a partial DNA sequence <SEQ ID 73> in N.gonorrhoeae encoding the 
following amino acid sequence <SEQ ID 74; ORF2ng>: 

1 MFDFGLGELI FVGIIALIVL GPERLPEAAR TAGRLIGRLQ RFVGSVKQEL 
5X DTQIELEELR KVKQAFEAAA AQVRDSLKET DTDMQNSLHD ISDGLKPWEK 
101 LPEQRTPADF GVDEKGNSLS RYGKHRIRRH FRRYAV* 

Further work identified the complete gonococcal gene sequence <SEQ ID 75>: 

1 ATGTTTGATT TCGGTTTGGG CGAGCTGATT TTTGTCGGCA TTATCGCCCT 

51 GATTGTCCTT GGTCCAGAAC GCCTGCCCGA AGCCGCCCGC ACTGCCGGAC 

101 GGCTTATCGG CAGGCTGCAA CGCTTTGTAG GAAGCGTCAA ACAAGAACTT 

151 GACACTCAAA TCGAACTGGA AGAGCTGAGG AAGGTCAAGC AGGCATTCGA 

201 AGCTGCCGCC GCTCAGGTTC GAGACAGCCT CAAAGAAACC GATACGGATA 

251 TGCAGAACAG TCTGCACGAC ATTTCCGACG GTCTGAAGCC TTGGGAAAAA 

301 CTGCCCGAAC AGCGCACGCc tgccgatttc gGTGTCGATg AAAacggcaa 

i 351 tccccttCCC gATACGGCAA ACACCGTATC AGACGGCATT TCCGACGTTA 

401 TGCCGTCTGA ACGTTCCGAT ACTtccgcCG AAACCCTTGG GGACGACAGG { 

451 CAAACCGGCA GTACAGCCGA ACCTGCGGAA ACCGACAAAG ACCGCGCATG 

501 GCGGGAATAC CTGactgctt ctgccgccgc acctgtcgta Cagagggccg 

551 tcqaagtcag ctaTATCGAT ACTGCTGTTG AAacgcctgT tccgcaCacc 

; 601 acttccctgc gcaAACAGGC AATAAACCGC AAACGCGATT TttgtccgaA 

651 ACACCGCGCc aAACCGAAat tgcgcgtcCG TAAATCATAA j 

This encodes a protein having the amino acid sequence <SEQ ID 76; ORF2ng-l>: j 

1 MFDFGLGELI FVGIIALIVL GPERLPEAAR TAGRLIGRLQ RFVGSVKQEL 

51 DTQIELEELR KVKQAFEAAA AQVRDSLKET DTDMQNSLHD ISDGLKPWEK 

f\ 101 LPEQRTPADF GVDENGNPLP DTANTVSDGI SDVMPSERSD TSAETLGDDR 

151 QTGSTAEPAE TDKDRAWREY LTASAAAPW QRAVEVSYID TAVETPVPHT 

201 TSLRKQAINR KRDFCPKHRA KPKLRVRKS * 

The originally-identified partial strain B sequence (ORF2) shows 87.5% identity over a 136aa 
overlap with ORF2ng: 

IS orf2 Pep MXDreLGELVFVGIIALIVLGPERXPEAARXAGRLIGRLQRFVGSVKQEFDTQIEL^^ 60 

35 orf2.pep | | | | I | I | : I I t 1 I I 1 I I I I M I I I I I I : I I I I I I I I I I I I I I I I II : I M I I I I I I I 

orf2ng MFDreLGELIFVGIIALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQELDTQIELEELR 60 



«rf2 oeP KAKQE FEAAAAQVRDSLKETGTDMEGNLHDI S DGLKPWEKLPEQRT PAD FGV DENGN PXS 

orf2.pep karu , , , , , , , , ,, , | 1 1 ;:: 1 1 1 1 1 | | | | | | | 1 1 | | I I 1 1 I I M I 1 1 : 1 1 
orf2ng ^KQAFEAAAAQVRDSLKETDTDMQNSLHDISDGLKPWEKLPEQRTPADFGVDEKGNSLP 



orf2 pep RCGKH P IRRHFRRY AV 136 

I ill I 1 I I I I I I 1 K 
45 orf2ng RYGKHRI RRHFRRYAV 136 

The complete strain B and gonococcal sequences (ORF2-1 & ORF2ng-l) show 91.7% identity in 
229 aa overlap: 



50 



10 20 30 40 50 60 

orf2-l.pep MFDFGLGELVFVGIIALIVLGPERLPEAARTAGRLIGRW 

orf2ng-l MF DreLGELi^GiiMIVLGPERLPE 



55 orf2-l.pep 
orf2ng-l 



I I I | | I 1 M : | | I I 1 I I I II I I I I I I I II I I I I I I I I I I I M I I I II II : I I I I I I I I I I 
''LI FVG I IALIVLGPERL PEAARTAGRL IGRLQRFVGS VKQELDTQIELEELR 
10 20 30 40 50 60 

70 80 90 100 110 120 

KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPLP 

|:|| | 1 1 | 1 1 1 1 1 1 I I 1 t I III:::MIIMIIMIIIIIIIIIMIIIMIIMMI 
KVKQAFEAAAAQVRDSLKETDTDMQNSLHDISDGLKPWEKLPEQRTPADFGVDENGNPLP 
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70 80 90 100 110 120 

130 140 150 160 170 180 

orf 2-1 . pep DAANTLSDGISDVMPSERSYASAETLGDSGQTGSTAEPAETEX3DRAWREYLTASAAAPW 
5 l:|||:Mllllltltill itllllll: i I I I I I I I I I I I : I I I I I I I I I I | I 1 1 | | | 

orf2ng-l DTANTVSDGISDVMPSERSDTSAETLGDDRQTGSTAEPAETDKDRAWREYLTASAAAPW 

130 140 150 160 170 180 

190 200 210 220 229 

10 orf2-l.pep Q-TVEVSYIDTAVETPVPHTTSLRKQAISRKRDFRPKHRAKPKLRVRKSX 

I • I I 1 I I 1 I I I I I I I t i I t 1 I I I I I 1 I r 1 I I | | I I I I II I I I I | | M I 
or f 2ng- 1 QRAVEVS YI DTAVET PVPHTTSLRKQAINRKRDFCPKHRAKPKLRVRKSX 

190 200 210 220 230 

Computer analysis of these amino acid sequences indicates a transmembrane region (underlined), 
15 and also revealed homology (59% identity) between the gonococcal sequence and the TatB protein 
of E.coli: 



20 



gnl|PID|e!2921Bl (AJ005830) TatB protein [Escherichia coli] Length - 171 
Score = 56.6 bits (134), Expect » le-07 

Identities - 30/88 (34%), Positives = 52/88 (59%), Gaps - 1/88 (1%) 

Query: 1 MFDFGLGELIFVGIIALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQELDTQIELEELR 60 

MFD G EL+ V II L+VLGP+RLP A +T I L+ +V+ EL +++L+E + 
Sbjct: 1 MFDIGFSELLLVFIIGLWLGPQRLPVAVKTVAGWIRALRSLATTVQNELTQELKLQEFQ 60 



25 Query: 61 -KVKQAFEAAAAQVRDSLKETDTDMQNS 87 

+K+ +A+ + LK + +++ + 
Sbjct: 61 DSLKKVEKASLTNLTPELKASMDELRQA 88 

Based on this analysis, it was predicted that ORF2, ORF2a and ORF2ng are likely to be membrane 
proteins and so the proteins from Kmeningitidis and K. gonorrhoeae^ and their epitopes, could be 
30 useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF2-1 (16kDa) was cloned in pET and pGex vectors and expressed in Ecoli, as described above. 
The products of protein expression and purification were analyzed by SDS-PAGE. Figure 3 A 
shows the results of affinity purification of the GST-fusion protein, and Figure 3B shows the results 
of expression of the His-fusion in E.coli. Purified GST-fusion protein was used to immunise mice, 
35 whose sera were used for Western blots (Figure 3C), ELISA (positive result), and FACS analysis 
(Figure 3D). These experiments confirm that ORF37-1 is a surface-exposed protein, and that it is 
a useful immunogen. 

Example 10 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 77>: 

40 1 ATGCAAGCAC GGCTGCTGAT ACCTATTCTT TTTTCAGTTT TTATTTTATC 

51 CGC.TGCGGG ACACTGACAG GTATTCCATC GCATGGCGgA GkTAAACgCT 

101 TTgCGGTCGA ACAAGAACTT GTGGCCGCTT CTGCCAGAGC TGCCGTTAAA 

151 GACATGGATT TACAGGCATT ACACGGACGA AAAGTTGCAT TGTACATTGC 

201 CACTATGGGC GACCAAGGTT CAGGcAGTTT GACAGGGGGG TCGCTACTCC 

45 251 ATTGATGCAC JcGrTwCsTGG CGAATACATA AACAGCCCTG CCGTCCGTAC 

301 CGATTACACC TATCCACGTT ACGAAACCAC CGCTGAAACA ACATCAGGCG 

351 GTTTGACAGG TTTAACCACT TCTTTATCTA CACTTAATGC CCCTGCACTC 

401 TCTCGCACCC AATCAGACGG TAGCGGAAGT AAAAGCAGTC TGGGCTTAAA 

451 TATTGGCGGG ATGGGGGATT ATCGAAATGA AACCTTGACG ACTAACCCGC 
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501 GCGACACTGC CTTTCTTTCC CACTTGGTAC AGACCGTATT TTTCCTGCGC 
551 GGCATAGACG TTGTTTCTCC TGCCAATGCC GATACAGATG TGTTTATTAA 
601 CATCGACGTA TTCGGAACGA TACGCAACAG AACCGAAATG „ . 

This corresponds to the amino acid sequence <SEQ ID 78; ORF15>: 

1 MQARLLIPIL FSV FILSACG TLTGIPSHGG XKRFAVEQEL VAASARAAVK 

51 DMDLQALHGR KVALYIATMG DQGSGSLTGG RYSIDAXXXG EYINSPAVRT 

101 DYTYPRYETT AETTSGGLTG LTTSLSTLNA PALSRTQSDG SGSKSSLGLN 

151 IGGMGDYRNE TLTTNPRDTA FLSHLVQTVF FLRGIDWSP ANADTDVFIN 

201. IDVFGTIRNR TEM. . 

Further work revealed the complete nucleotide sequence <SEQ ID 79>: 

1 ATGCAAGCAC GGCTGCTGAT ACCTATTCTT TTTTCAGTTT TTATTTTATC 

51 CGCCTGCGGG ACACTGACAG GTATTCCATC GCATGGCGGA GGTAAACGCT 

101 TTGCGGTCGA ACAAGAACTT GTGGCCGCTT CTGCCAGAGC TGCCGTTAAA 

151 GACATGGATT TACAGGCATT ACACGGACGA AAAGTTGCAT TGTACATTGC 

201 CACTATGGGC GACCAAGGTT CAGGCAGTTT GACAGGGGGT CGCTACTCCA 

251 TTGATGCACT GATTCGTGGC GAATACATAA ACAGCCCTGC CGTCCGTACC 

301 GATTACACCT ATCCACGTTA CGAAACCACC GCTGAAACAA CATCAGGCGG 

351 TTTGACAGGT TTAACCACTT CTTTATCTAC ACTTAATGCC CCTGCACTCT 

401 CTCGCACCCA ATCAGACGGT AGCGGAAGTA AAAGCAGTCT GGGCTTAAAT 

451 ATTGGCGGGA TGGGGGATTA TCGAAATGAA ACCTTGACGA CTAACCCGCG 

501 CGACACTGCC TTTCTTTCCC ACTTGGTACA GACCGTATTT TTCCTGCGCG 

551 GCATAGACGT TGTTTCTCCT GCCAATGCCG ATACAGATGT GTTTATTAAC 

601 ATCGACGTAT TCGGAACGAT ACGCAACAGA ACCGAAATGC ACCTATACAA 

651 TGCCGAAACA CTGAAAGCCC AAACAAAACT GGAATATTTC GCAGTAGACA 

701 GAACCAATAA AAAATTGCTC ATCAAACCAA AAACCAATGC GTTTGAAGCT 

751 GCCTATAAAG AAAATTACGC ATTGTGGATG GGGCCGTATA AAGTAAGCAA 

801 AGGAATTAAA CCGACGGAAG GATTAATGGT CGATTTCTCC GATATCCGAC 

851 CATACGGCAA TCATACGGGT AACTCCGCCC CATCCGTAGA GGCTGATAAC 

901 AGTCATGAGG GGTATGGATA CAGCGATGAA GTAGTGCGAC AACATAGACA 

951 AGGACAACCT TGA 

This corresponds to the amino acid sequence <SEQ ID 80; ORF15-l>: 

1 MQARLLIPIL FSVFILSA CG TLTGIPSHGG GKRFAVEQEL VAASARAAVK 

51 DMDLQALHGR KVALYIATMG DQGSGSLTGG RYSIDALIRG EYINSPAVRT 

101 DYTYPRYETT AETTSGGLTG LTTSLSTLNA PALSRTQSDG SGSKSSLGLN 

151 IGGMGDYRNE TLTTNPRDTA FLSHLVQTVF FLRGIDWSP ANADTDVFIN 

201 IDVFGTIRNR TEMHLYNAET LKAQTKLEYF AVDRTNKKLL IKPKTNAFEA 

251 AYKENYALWM GPYKVSKGIK PTEGLMVDFS DIRPYGNHTG NSAPSVEADN 

301 SHEGYGYSDE WRQHRQGQP * 

Further work identified the corresponding gene in strain A of ^meningitidis <SEQ ID 81>: 

1 ATGCAAGCAC GGCTGCTGAT ACCTATTCTT TTTTCAGTTT TTATTTTATC 

51 CGCCTGCGGG ACACTGACAG GTATTCCATC GCATGGCGGA GGTAAACGCT 

101 TTGCGGTCGA ACAAGAACTT GTGGCCGCTT CTGCCAGAGC TGCCGTTAAA 

151 GACATGGATT TACAGGCATT ACACGGACGA AAAGTTGCAT TGTACATTGC 

201 AACTATGGGC GACCAAGGTT CAGGCAGTTT GACAGGGGGT CGCTACTCCA 

251 TTGATGCACT GATTCGTGGC GAATACATAA ACAGCCCTGC CGTCCGTACC 

301 GATTACACCT ATCCACGTTA CGAAACCACC GCTGAAACAA CATCAGGCGG 

351 TTTGACAGGT TTAACCACTT CTTTATCTAC ACTTAATGCC CCTGCACTCT 

401 CGCGCACCCA ATCAGACGGT AGCGGAAGTA AAAGCAGTCT GGGCTTAAAT 

451 ATTGGCGGGA TGGGGGATTA TCGAAATGAA ACCTTGACGA CTAACCCGCG 

501 CGACACTGCC TTTCTTTCCC ACTTGGTACA GACCGTATTT TTCCTGCGCG 

551 GCATAGACGT TGTTTCTCCT GCCAATGCCG ATACGGATGT GTTTATTAAC 

601 ATCGACGTAT TCGGAACGAT ACGCAACAGA ACCGAAATGC ACCTATACAA 

651 TGCCGAAACA CTGAAAGCCC AAACAAAACT GGAATATTTC GCAGTAGACA 

701 GAACCAATAA AAAATTGCTC ATCAAACCAA AAACCAATGC GTTTGAAGCT 

751 GCCTATAAAG AAAATTACGC ATTGTGGATG GGACCGTATA AAGTAAGCAA 

801 AGGAATTAAA CCGACAGAAG GATTAATGGT CGATTTCTCC GATATCCAAC 

851 CATACGGCAA TCATATGGGT AACTCTGCCC CATCCGTAGA GGCTGATAAC 

901 AGTCATGAGG GGTATGGATA CAGCGATGAA GCAGTGCGAC GACATAGACA 

951 AGGGCAACCT TGA 

This encodes a protein having amino acid sequence <SEQ ID 82; ORF15a>: 

! MQARLLIPIL FSVFILSAC G TLTGIPSHGG GKRFAVEQEL VAASARAAVK 
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51 DMDLQALHGR KVALYIATMG DQGSGSLTGG RYSIDALIRG EYINSPAVRT 

101 DYTYPRYETT AETTSGGLTG LTTSLSTLNA PALSRTQSDG SGSKSSLGLN 

151 IGGMGDYRNE TLTTNPRDTA FLSHLVQTVF FLRGIDWSP ANADTDVFIN 

201 IDVFGTIRNR TEMHLYNAET LKAQTKLEYF AVDRTNKKLL IKPKTNAFEA 

251 AYKENYALWM GPYKVSKGIK PTEGLMVDFS DIQPYGNHMG NSAPSVEADN 

301 SHEGYGYSDE AVRRHRQGQP * 

The originally-identified partial strain B sequence (ORF15) shows 98.1% identity over a 213aa 
overlap with ORF15a: 

10 20 30 40 50 60 

MQARLLI PI LFS VFILSA CGTLTGI PSHGGXKRFAVEQELVAASARAAVKDMDLQALHGR 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I 
MQARLLI PILFSVFILSA CGTLTGIPSHGGGKRFAVEQELVAASARAAVKDMDLQALHGR 
10 20 30 40 50 60 

70 80 90 100 110 120 

KVALYIATMGDQGSGSLTGGRYSIDAXXXGEYINSPAVRTDYTYPRYETTAETTSGGLTG 
I I I I 1 1 I I I 1 1 ! I 1 1 i I I I t I I I I I 1 II Ml Ml II I i M III Ml I III MM II! 
KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINSPAVRTDYTYPRYETTAETTSGGLTG 
70 80 90 100 110 120 



orflS.pep 
orflSa 

orflS.pep 
orflSa 



130 140 150 160 170 180 

or f 1 5 . pep LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 
I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I II I I I I 
orflSa LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 

130 140 150 160 170 180 

190 200 210 

orflS.pep FLRG I DWS PANADTDVFINI DVFGT IRNRTEM 
I I I I I I II I I I I I II I I II I I I I I I I I I I I I I I 
orflSa FLRG I DWS PAN ADT DVFIN I DVFGT IRNRTEMHLYN AETLKAQTKLE YFAVDRTNKKLL 

190 200 210 220 230 240 

The complete strain B sequence (ORF15-1) and ORF15a show 98.8% identity in 320 aa overlap: 

10 20 30 40 50 60 

orflSa . pep MQARLLI PI LFSVFILSACGTLTG I PSHGGGKRFAVEQELVAASARAAVKDMDLQALHGR 

I I I I I I I I I I I II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 
orf 15-1 MQARLLI PILFSVFILSACGTLTGIPSHGGGKRFAVEQELVAASARAAVKDMDLQALHGR 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 15a . pep KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINSPAVRTDYTYPRYETTAETTSGGLTG 

I I I I I I I I I I I I I I I I I I I I II I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 15-1 KVALYIATMG DQG S GS LTGGRY S I DAL I RGE Y INS PAVRT DYT Y PRYETTAETT SGGLTG 

70 80 90 100 110 120 

130 140 150 160. 170 180 

orf 15a . pep LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 m 1 1 m 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 r 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

orf 15-1 LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 

130 140 ISO 160 170 180 

190 200 210 220 230 240 

orflSa . pep FLRG I DVVS PANADTDVFINI DVFGT IRNRTEMHLYN AETLKAQTKLE YFAVDRTNKKLL 

I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I 
or f 1 5- 1 FLRGIDWS PANADTDVFINI DVFGT IRNRTEMHLYNAETLKAQTKLEYFAVDRTNKKLL 

190 200 210 220 230 240 

250 260 270 280 290 300 

orflSa . pep IKPKTNAFEAAYKENYALWMGPYKVSKGIKPTEGLMVDFSDIQPYGNHMGNSAPSVEADN 
I I 11 I II I I I I I I I I I I I I I I I I I I I I i I I I I I I i I I I M I !: I I I I I lllllllltll 
orf 15-1 IKPKTNAFEAAYKENYALWMGPYKVSKGIKPTEGLMVDFSDIRPYGNHTGNSAPSVEADN 

250 260 270 280 290 300 



orf 15a. pep 
orfl5-l 



310 320 
SHEGYGYSDEAVRRHRQGQPX 
I I I I i II I I I: I I : I I I I i I I 
SHEGYGYSDEWRQHRQGQPX 
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310 320 

Further work identified the corresponding gene in N.gonorrhoeae <SEQ ID 83>: 

1 ATGCGGGCAC GGCTGCTGAT ACCTATTCTT TTTTCAGTTT TTATTTTATC 

51 CGCCTGCGGG ACACTGACAG GTATTCCATC GCATGGCGGA GGCAAACGCT 

101 TCGCGGTCGA ACAAGAACTT GTGGCCGCTT CTGCCAGAGC TGCCGTTAAA 

151 GACATGGATT TACAGGCATT ACACGGACGA AAAGTTGCAT TGTACATTGC 

901 AACTATGGGC GACCAAGGTT CAGGCAGTTT GACAGGGGGT CGCTACTCCA 

251 TTGATGCACT GATTCGCGGC GAATACATAA ACAGCCCTGC CGTCCGCACC 

301 GATTACACCT ATCCGCGTTA CGAAACCACC GCTGAAACAA CATCAGGCGG 

351 TTTGACGGGT TTAACCACTT CTTTATCTAC ACTTAATGCC CCTGCACTCT 

401 CGCGCACCCA ATCAGACGGT AGCGGAAGTA GGAGCAGTCT GGGCTTAAAT 

451 ATTGGCGGGA TGGGGGATTA TCGAAATGAA ACCTTGACGA CCAACCCGCG 

501 CGACACTGCC TTTCTTTCCC ACTTGGTGCA GACCGTATTT TTCCTGCGCG 

551 GCATAGACGT TGTTTCTCCT GCCAATGCCG ATACAGATGT GTTTATTAAC 

601 ATCGACGTAT TCGGAACGAT ACGCAACAGA ACCGAAATGC ACCTATACAA 

651 TGCCGAAACA CTGAAAGCCC AAACAAAACT GGAATATTTC GCAGTAGACA 

701 GAACCAATAA AAAATTGCTC ATCAAACCCA AAACCAATGC GTTTGAAGCT 

751 GCCTATAAAG AAAATTACGC ATTGTGGATG GGGCCGTATA AAGTAAGCAA 

801 AGGAATCAAA CCGACGGAAG GATTGATGGT CGATTTCTCC GATATCCAAC 

851 CATACGGCAA TCATACGGGT AACTCCGCCC CATCCGTAGA GGCTGATAAC 

901 AGTCATGAGG GGTATGGATA CAGCGATGAA GCAGTGCGAC AACATAGACA 

951 AGGGCAACCT TGA 

This encodes a protein having amino acid sequence <SEQ ID 84; ORF15ng>: 

1 MRARLLIPIL FSV FILSAC G TLTGIPSHGG GKRFAVEQEL VAASARAAVK 

~r 51 DMDLQALHGR KVALYIATMG DQGSGSLTGG RYSIDALIRG EYINSPAVRT 

^ 101 dytYPRYETT AETTSGGLTG LTTSLSTLNA PALSRTQSDG SGSRSSLGLN 

151 IGGMGDYRNE TLTTNPRDTA FLSHLVQTVF FLRGIDWSP ANADTDVFIN | 

201 IDVFGTIRNR TEMHLYNAET LKAQTKLEYF AVDRTNKKLL IKPKTNAFEA 

251 AYKENYALWM GPYKVSKGIK PTEGLMVDFS DIQPYGNHTG NSAPSVEADN | 

3Q 301 SHEGYGYSDE AVRQHRQGQP * 

The originally-identified partial strain B sequence (ORF15) shows 97.2% identity over a 213aa 
overlap with ORF1 5ng: 

orflS pep MQARLLIPILFSVFILSACGTLTGIPSHGGXKRFAVEQELVAASARAAVKDMDLQALHGR 
orflS.pep | ; | | | | | | | | | | | | | | | | | I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
35 orflSng MRARLLIPILFSVFILSACGTLTGIPSHGGGKRFAVEQELVAASARAAVKDMDLQALHGR 



orflS.pep 
orf 15ng 



KVALYIATMG DQGSGSLTGGRYSIDAXXXGEYINSPAVRTDYTYPRYETTAETTSGGLTG 

I | I | I I I I | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I 
KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINSPAVRTDYTYPRYETTAETTSGGLTG 



60 
60 
120 
120 



„rf1 5 net) LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 180 
orflS.pep | II | II I II 1 1 1 I 1 1 I I I I I I 1 1 : 1 1 I I I 1 1 I I I I I I 1 1 I 1 1 I I I I 1 1 I I I M I M I I i I 
LTTSLSTLNAPALSRTQSDGSGSRSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 



orf 15ng 
45 orfl5.pep 



FLRGI DWS PANADTDVFIN I DVFGT IRNRTEM 
FLRGIDWSPANADTDVFINIDVFGTIRNRTEMHLTO 



180 
213 
240 



orflSng 

The complete strain B sequence (ORF15-1) and ORF15ng show 98.8% identity in 320 aa overlap 

10 20 30 40 50 60 



50 orf 15-1. pep 



MQARLLIPILFSVFILSACGTLTGIPSHGGGKRFAVEQELVAASARAAVKDMDLQALHGR 
,,, I , i , , , I | | | | 1 | | | | I I I I I I 1 I I I 1 1 1 M M 1 ! I I I 1 I M I i I I I I I t M 1 1 I I 

orflSng MRMILLIPILFSVFILSACCTL^ 

c . 70 80 90 100 110 120 

55 „r-fl5 1 oeo KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINSPAVRTDYTYPRYETTAETTSGGLTG 

orflS-l.pep || || | | | | | | I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I • I I I I I I M I I I I I I II I I 

orflSng KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINSPAVRTDYTYPRYETTAETTSGGLTG 

orrj-ony ^ o 8Q gQ 1Q0 110 12 0 

60 130 140 150 160 170 180 

or f 15-1 • Pep LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 
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10 



I I I I I I I I I I I I I I 1 1 I I I I I I I : I I I I I I II I I I M I I I I I I I I I I I I II I I I I I I I I I 
orfl5ng LTTSLSTLNAPALSRTQSDGSGSRSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 15-1 . pep FLRG I DWS PAN ADT DV FIN I D V FGT I RNRTEMHL YNAET LKAQTKLE YFAVDRTNKKLL 
IIMIMI MM II II Ml illllillim II II Ml INI III III I II I MM INI 

orfl5ng FLRG I DWS PANADTDVFIN I DVFGT I RNRTEMHLYNAETLKAQTKLE YFAVDRTNKKLL 

190 200 210 220 230 240 



250 260 270 280 290 300 

orf 15-1 . pep IKPKTNAFEAAYKENYALWMGPYKVSKGIKPTEGLMVDFSDIRPYGNHTGNSAPSVEADN 
I. II M I 11 II M II II II I I II II I I II I II M II I I I II II : I II I I I II M I II II I I 
orfl5ng IKPKTNAFEAAYKENYALWMGPYKVSKGIKPTEGLMVDFSDIQPYGNHTGNSAPSVEADN 
15 250 260 270 280 290 300 

310 320 
orf 15-1 . pep SHEGYGYSDEWRQHRQGQPX 
I M II II II I : I I I M M I I I 
20 orfl5ng SHEGYGYSDEAVRQHRQGQPX 

310 320 

Computer analysis of these amino acid sequences reveals an ILSAC motif (putative membrane 
lipoprotein lipid attachment site, as predicted by the MOTIFS program). 

indicates a putative leader sequence, and it was predicted that the proteins from N.meningitidis and 
25 N gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

ORF15-1 (31.7kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
4A shows the results of affinity purification of the GST-fusion protein, and Figure 4B shows the 
30 results of expression of the His-fusion in KcolL Purified GST-fusion protein was used to immunise 
mice, whose sera were used for Western blot (Figure 4C) and ELISA (positive result). These 
experiments confirm that ORFX-1 is a surface-exposed protein, and that it is a useful immunogen. 

Example 11 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 85>: 

35 1 . . GG . CAGCACA AAAAACAGGC GGTTGAACGG AAAAACCGTA TTTACGATGA 

51 TGCCGGGTAT GATATTCGGC GTATTCACGG GCGCATTCTC CGCAAAATAT 

101 ATCCCCGCGT TCGGGCTTCA AATTTTCTTC ATCCTGTTTT TAACCGCCGT 

151 CGCATTCAAA ACACTGCATA CCGACCCTCA GACGGCATCC CGCCCGCTGC 

201 CCGGACTGCC CrGACTGACT GCGGTTTCCA CACTGTTCGG CACAATGTCG 

40 251 AGCTGGGTCG GCATAGGCGG CGGTTCACTT TCCGTCCCCT TCTTAATCCA 

301 CTGCGGCTTC CCCGCCCATA AAGCCATCGG CACATCATCC GGCCTTGCCT 

. 351 GGCCGATTGC ACTCTCCGGC GCAATATCGT ATCTGCTCAA CGGCCTGAAT 

401 ATTGCAGGAT TGCCCGAAGG GTCACTGGGC TTCCTTTACC TGCCCGCCGT 

451 CGCCGTCCTC AGCGCGGCAA CCATTGCCTT TGCCCCGCTC GGTGTCAAAA 

45 501 CCGCCCACAA ACTTTCTTCT GCCAAACTCA AAAAATC . TT CGGCATTATG 

551 TTGCTTTTGA TTGCCGGAAA AATGCTGTAC AACCTGCTTT AA 

This corresponds to the amino acid sequence <SEQ ID 86; ORF17>: 



1 ..GQHKKQAVNG KTVFTMMPGM IFGVFTGAFS AKYIPAFGLQ IFFILFLTAV 
51 AFKTLHTDPQ TASRPLPGLP XLTAVSTLFG TMSSWVGIGG GSLSVPFLIH 
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101 CGFPAHKAIG TSSGLAWPIA LSGAI SYLLN GLNIAGLPEG SLGFLYLPAV 
151 AVLSAATIAF APLGVKTAHK LSSAKLKKSF GIMLLLIAGK MLYNLL* 

Further work revealed the complete nucleotide sequence <SEQ ID 87>: 

1 ATGTGGCATT GGGACATTAT CTTAATCCTG CTTGCCGTAG GCAGTGCGGC 

< 51 AGGTTTTATT GCCGGCCTGT TCGGCGTAGG CGGCGGCACG CTGATTGTCC 

101 CTGTCGTTTT ATGGGTGCTT GATTTGCAGG GTTTGGCACA ACATCCTTAC 

151 GCGCAACACC TCGCCGTCGG CACATCCTTC GCCGTCATGG TCTTCACCGC 

201 CTTTTCCAGT ATGCTGGGGC AGCACAAAAA ACAGGCGGTC GACTGGAAAA 

251 CCGTATTTAC GATGATGCCG GGTATGATAT TCGGCGTATT CACGGGCGCA 

1A 301 CTCTCCGCAA AATATATCCC CGCGTTCGGG CTTCAAATTT TCTTCATCCT 

iU 351 QTTTTTAACC GCCGTCGCAT TCAAAACACT GCATACCGAC CCTCAGACGG 

401 CATCCCGCCC GCTGCCCGGA CTGCCCGGAC TGACTGCGGT TTCCACACTG 

451 TTCGGCACAA TGTCGAGCTG GGTCGGCATA GGCGGCGGTT CACTTTCCGT 

501 CCCCTTCTTA ATCCACTGCG GCTTCCCCGC CCATAAAGCC ATCGGCACAT 

,c 551 CATCCGGCCT TGCCTGGCCG ATTGCACTCT CCGGCGCAAT ATCGTATCTG 

60 i CTCAACGGCC TGAATATTGC AGGATTGCCC GAAGGGTCAC TGGGCTTCCT 

651 TTACCTGCCC GCCGTCGCCG TCCTCAGCGC GGCAACCATT GCCTTTGCCC 

701 CGCTCGGTGT CAAAACCGCC CACAAACTTT CTTCTGCCAA ACTCAAAAAA 

751 Tc . TTCGGCA TTATGTTGCT TTTGATTGCC GGAAAAATGC TGTACAACCT 

20 801 GCTTTAA 

This corresponds to the amino acid sequence <SEQ ID 88; ORF17-l>: 

1 MWHWDII LIL LAVGSAAGF I AG LFGVGGGT LIVPWLWV L DLQGLAQHPY | 

51 ROHT.A VGTSF AVMVFTAFSS ML GQHKKQAV DWKTVFTMMP GMIFGVFTGA 

101 T.SAKYIP AFG LQIFFILFLT AVAFK TLHTD PQTASRPLPG LPGLTAVSTL 

0 r 151 FGTMSSWVGI GGGSLSVPFL IHCGFPAHKA IGTSSGLAWP IALSGAISYL 

ZD 201 LNGLNIAGLP EGSLGFLYLP AVAVLSAATI AFAPLGV KTA HKLSSAKLKK 

251 X FGIMLLLIA GKMLYNLL * 

Computer analysis of this amino acid sequence gave the following results: | 
Unmnln ^/ with hypothetical HAnfluemae transmembrane protein HI0902 (accession number P44070) 
30 ORF17 and H10902 proteins show 28% aa identity in 192 aa overlap: 

fiRF17 3 HKKQAVNGKTVFTMMPGMI FGVFT-GAFSAKYI PAFGLQI F FI LFLTAVAFKTLHTDP 59 

HK + + V + P++ VF GF + +IF +++L ++ D 

HI0902 72 HKLGNIVWQAVRILAPVIMLSVFICGLFIGRLDREISAKIFACLWYLATKMVLSIKKD- 130 

« ORF17 60 QTASRPLPGLPXLTAVSTLFGTMSSWVGIGGGSLSVPFLIHCGFPAHKAIGTSSGLAWPI 119 

*° Q ++ L L + L G SS GIGGG VPFL G +AIG+S+ + 

HI0902 131 QVTTKSLTPLSSVIG-GILIGMASSAAGIGGGGFIVPFLTARGINIKQAIGSSAFCGMLL 189 

ORF17 120 ALSGAISYLLNGLNIAGLPEGSLGFLYLPAVAVLSAATIAFAPLGVXXXXXXXXXXXXXX 179 
A(\ +SG S++++G +PE SLG++YLPAV ++A + + LG 

HI0902 190 GISGMFSFIVSGWGNPLMPEYSLGYIYLPAVLGITATSFFTSKLGASATAKLPVSTLKKG 249 

ORF17 180 FGIMLLLIAGKM 191 
F + L+++A M 
45 HI0902 250 FALFLIWAINM 261 

Hnmolo gv with a predicted OR F from Kmeninzitidis (strain A) 

ORF17 shows 96.9% identity over a 196aa overlap with an ORF (ORF17a) from strain A of J* 
meningitidis: 

10 20 30 

non GOHKKOAVNGKT VFTMMPGMI FGVFTGA FS 

orfl7 ' peP I 111 I Ml: III II llll l:HI I : M r 1 



50 



55 



orfl7a ot:t .aoh P Y AQHL AVGTS FAVMVFTAFS SML GQHKKQAV DWKT VFTMM PGMVFGVFAGALS 

50 60 70 80 90 100 

40 50 60 70 80 90 

«-rfl7 neo AKYI PAFGLQI FFILFLTAVAFKTLHTDPQTASRPLPGLPXLTAVSTLFGTMSSWVGIGG 
orfl7.pep AKY1P M y ||||m|m| | | | | I | i | | | | | | | I i I I I I I I I I I I I I I I I I I I I I I 
orfl7a AKYT P AFGLQI FFILFLTAVAFK TLHTDPQTASRPLPGLPGLTAVSTLFGTMSSWVGIGG 
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110 120 130 140 150 160 

100 110 120 130 140 150 

orf 17 . pep GSLSVPFLIHCGFPAHKAIGTSSGLAWPIALSGAISYLLNGLNIAGLPEGSLGFLYLPAV 

1 1 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 1 i M i 1 1 M 1 1 1 1 1 1 1 n 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 iTT 

orf 17a GSLSVPFLIHCGFPAHKAIGTSSGLAWPIALSGAISYLLNGLNIAGLPEGSLGFLYLPAV 
170 180 190 200 210 220 

160 170 180 190 

orf 17 . pep AVLSAATIAFAPLGV KTAHKLSSAKLKKS FGIMLLLIAGKMLYKLL X 
MIIMIM III MM I MM Mil! I I I Mil II II Mill Mil! 
orfl7a AVLSAATIAFAPLGVK TAHKLSSAKLKKS FGIMLLLIAGKMLYNLLX 
230 240 250 260 

The complete length ORF17a nucleotide sequence <SEQ ID 89> is: 

1 ATGTGGCATT GGGACATTAT CTTAATCCTG CTTGCCGTAG GCAGTGCGGC 

51 AGGTTTTATT GCCGGCCTGT TCGGCGTAGG CGGCGGCACG CTGATTGTCC 

101 CTGTCGTTTT ATGGGTGCTT GATTTGCAGG GTTTGGCACA ACATCCTTAC 

151 GCGCAACACC TCGCCGTCGG CACATCCTTC GCCGTCATGG TCTTCACCGC 

201 CTTTTCCAGT ATGCTGGGGC AGCACAAAAA ACAGGCGGTC GACTGGAAAA 

251 CCGTATTTAC GATGATGCCG GGTATGGTAT TCGGCGTATT CGCTGGCGCA 

301 CTCTCCGCAA AATATATCCC AGCGTTCGGG CTTCAAATTT TCTTCATCCT 

351 GTTTTTAACC GCCGTCGCAT TCAAAACACT GCATACCGAC CCTCAGACGG 

401 CATCCCGCCC GCTGCCCGGA CTGCCCGGAC TGACTGCGGT TTCCACACTG 

451 TTCGGCACAA TGTCGAGCTG GGTCGGCATA GGCGGCGGTT CACTTTCCGT 

501 CCCCTTCTTA ATCCACTGCG GCTTCCCCGC CCATAAAGCC ATCGGCACAT 

551 CATCCGGCCT TGCCTGGCCG ATTGCACTCT CCGGCGCAAT ATCGTATCTG 

601 CTCAACGGCC TGAATATTGC AGGATTGCCC GAAGGGTCAC TGGGCTTCCT 

651 TTACCTGCCC GCCGTCGCCG TCCTCAGCGC GGCAACCATT GCCTTTGCCC 

701 CGCTCGGTGT CAAAACCGCC CACAAACTTT CTTCTGCCAA ACTCAAAAAA 

751 TCCTTCGGCA TTATGTTGCT TTTGATTGCC GGAAAAATGC TGTACAACCT 

801 GCTTTAA 

This encodes a protein having amino acid sequence <SEQ ID 90>: 

1 MWHWDIILIL LAVGSAAGF I AG LFGVGGGT LIVPWLWV L DLQGLAQHPY 

51 AQHL AVGTSF AVMVFTAFSS ML GQHKKQAV DWKT VFTMMP GMVFGVFAGA 

101 LSAKYI PAFG LQIFFILFLT AVAF KTLHTD PQTASRPLPG LPGLTAVSTL 

151 FGTMSSWVGI GGGSLSVPFL IHCGFPAHKA IGTSSGLAWP IALSGAISYL 

201 LNGLNIAGLP EGSLGFLYLP AVAVLSAATI AFAPLGVK TA HKLSSAKLKK 

251 S FGIMLLLIA GKMLYNLL * 

ORF17a and ORF17-1 show 98.9% identity in 268 aa overlap: 

10 20 30 40 50 60 

orf 17a . pep MWHWDIILILLAVGSAAGFIAGLFGVGGGTLIVPWLWVLDLQGLAQHPYAQHLAVGTSF 

1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 ii 1 1 1 1 1 1 1 

orf 17-1 MWHWDIILILLAVGSAAGFIAGLFGVGGGTLIVPWLWVLDLQGLAQHPYAQHLAVGTSF 

10 20 30 40 50 60 



70 80 90 100 110 120 

or f 17 a . pep AVMVFTAFSSMLGQHKKQAVDWKTVFTMMPGMVFGVFAGALSAKYIPAFGLQI FFILFLT 

I I M I I I II I I I I I I I 11 I I I I I I : I I I I : I I | I | I M I I I I I I II I I I I I I 

or f 1 7 - 1 AVMVFTAFSSMLGQHKKQAVDWKTVFTMMPGMI FGVFTGALSAKY I PAFGLQI FFI LFLT 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 17a. pep AVAFKTLHTDPQTASRPLPGLPGLTAVSTLFGTMSSWVGIGGGSLSVPFLIHCGFPAHKA 
MIMIIMII Ml MMMM MMM III Ml I II II M I I I I III MM IMM III 
orf 17-1 AVAFKTLHTDPQTASRPLPGLPGLTAVSTLFGTMSSWVGIGGGSLSVPFLIHCGFPAHKA 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 17a . pep IGT S SGLAWP I ALSGAI S YLLNGLN I AGLPEG SLG FLYL PAVAVLSAAT I AFAPLGVKTA 

I I I I I I I I I I I I I I t I I I i I I I I i I f I t I I 1 I I I I I I I I J t I I I I I t I I I I i I | t f I | 1 | 
orf 17-1 IGTSSGLAWPIALSGAISYLLNGLNIAGLPEGSLGFLYLPAVAVLSAATIAFAPI/3VKTA 

190 200 210 220 230 240 



orf 17a. pep 



250 260 269 

HKLS SAKLKKS FGIMLLLI AGKMLYNLLX 
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tlllllllll 1 1 I I I I I I I I I 1 I 1 I 1 I 1 
orf!7-l HKLSSAKLKKXFGIMLLLIAGKMLYNLLX 

250 260 

Homology with a predicted QRF from ^ gonorrhoeae 

ORF17 shows 93.9% identity over a 196aa overlap with a predicted ORF (ORF17.ng) from N. 
gonorrhoeae: 

n^o GQHKKQAVNGKTVFTMMPGMI FGVFTGAFS 30 

orfl7.pep MIMIII: I I : I : I II I I I I I I I : I I : I 



orf 17ng 



QGLAQH P YAQHLAVGT S FAVMVFTAFS SMLGQHKKQAVDWKT I FAMMPGMI FGVFAGALS 



102 



orf 17 oeo AKYIPAFGLQIFFILFLTAVAFKTLHTDPQTASRPLPGLPXLTAVSTLFGTMSSWVGIGG 90 

' P P Ml Ml | MM II II I II II U U II I Mil II I I II I I hi mini I _ 

orfl7ng AKYIPAFGLQIFFILFLTAVAFKTLHTGRQTASRPLPGLPGLTAVSTLFGAMSSWVGIGG 162 



orf 17. pep 



GSLSVPFLIHCGFPAHKAIGTSSGLAWPIALSGAISYLLNGLNIAGLPEGSLGFLYLPAV 150 
M I II I I II I I M 1 1 M II II M I II II II M II I II I : M I I II M M II M II M I M 
orfl7ng GSLSVPFLIHCGFPAHKAIGTSSGLAWPIALSGAISYLVNGLNIAGLPEGSLGFLYLPAV 202 

orf 17 pep AVLSAATI AFAPLGVKTAHKLSS AKLKKS FG IMLLLI AGKMLYNLL 196 

|| I I i M M M II I I M II II M 1 M 1 : M II I I I M II I II M i I 
orfl7ng AVLSAATIAFAPLGVKTAHKLSSAKLKESFGIMLLLIAGKMLYNLL 268 

An ORF17ng nucleotide sequence <SEQ ID 91> is predicted to encode a protein having amino acid 
sequence <SEQ ID 92>: 

1 MWHWDIILIL LAVGSAAGFI AGLFGVGGGT LIVPWLWVL DLQGLAQHPY j 

51 AQHLAVGTSF AVMVFTAFSS MLGQHKKQAV DWKTIFAMMP GMIFGVFAGA \ 

101 LSAKYIPAFG LQIFFILFLT AVAFKTLHTG RQTASRPLPG LPGLTAVSTL 

151 FGAMSSWVGI GGGSLSVPFL IHCGFPAHKA IGTSSGLAWP IALSGAISYL 

201 VNGLNIAGLP EGSLGFLYLP AVAVLSAATI AFAPLGVKTA HKLSSAKLKE 

251 SFG IMLLLI A GKMLYNLL* 

Further work revealed the complete gonococcal DNA sequence <SEQ ID 93>: 

1 ATGTGGCATT GGGACATTAT CTTAATCCTG CTTGCcgtag gcAGTGCGGC 

51 AGGTTTTATT GCCGGCCTGT Tcggtgtagg cggcgGTACG CTGATTGTCC 

101 CTGTCGTTTT ATGGGTGCTT GATTTGCAGG GTTTGGCACA ACATCCTTAC 

151 GCGCAACACC TCGCCGTCGG CAcaTccttc gcCGTCATGG TCTTCACCGC 

201 CTTTTCCAGT ATGTTGGGGC AGCACAAAAA ACAGGCGGTC GACTGGAAAA 

251 CCATATTTGC GATGATGCCG GGTATGATAT TCGGCGTATT CGCTGGCGCA 

301 CTCTCCGCAA AATATATCCC CGCGTTCGGG CTTCAAATTT TCTTCATCCT 

351 GTTTTTAACC GCCGTCGCAT TCAAAACACT GCATACCGGT CGTCAGACGG 

401 CATCCCGCCC GCTGCCCGGG CTGCCCGGAC TGACTGCGGT TTCCACACTG 

451 TTCGGCGCAA TGTCGAGCTG GGTCGGCATA GGCGGCGGTT CACTTTCCGT 

501 CCCCTTCTTA ATCCACTGCG GCTTCCCCGC CCATAAAGCC ATCGGCACAT 

551 CATCCGGCCT TGCCTGGCCG ATTGCACTCT CCGGCGCAAT ATCGTATCTG 

601 GTCAACGGTC TGAATATTGC AGGATTGCCC GAAGGGTCGC TGGGCTTCCT 

651 TTACCTGCCC GCCGTCGCCG TCCTCAGCGC GGCAACCATT GCCTTTGCCC 

701 CGCTCGGTGT CAAAACCGCC CACAAACTTT CTTCTGCCAA ACTCAAAGAA 

751 TCCTTCGGCA TTATGTTGCT TTTGATTGCC GGAAAAATGC TGTACAACCT 

801 GCTTTAA 

This corresponds to the amino acid sequence <SEQ ID 94; ORF17ng-l>: 

1 MWHWDIILIL LAVGSAAGF I AG LFGVGGGT LIVPWLWVL DLQGLAQHPY 

51 AQHL AVGTSF AVMVFTAFSS ML GQHKKQAV DWKT IFAMMP GMIFGVFAGA 

101 LSAKYIP AFG LQIFFILFLT AVAF KTLHTG RQTASRPLPG LPGLTAVSTL 

151 FGAMSSWVGI GGGSLSVPFL IHCGFPAHKA IGTSSGLAWP IALSGAISYL 

201 VNGLNIAGLP EGSLGFLYLP AVAVLSAATI AFAPLGVK TA HKLSSAKLKE 

251 S FGIMLLLIA GKMLYNLL * 

ORF17ng-l and ORF17-1 show 96.6% identity in 268 aa overlap: 

10 20 30 40 50 60 



orf 17-1. pep 



MWHWDIILILLAVGSAAGFIAGLFGVGGGTLIVPWLWVLDLQGLAQHPYAQHLAVGTSF 
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1 1 1 1 1 1 i 1 1 ii 1 1 1 ii 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 n 1 1 ii 1 1 1 1 ii 1 1 1 1 1 1 1 

orfi7ng-i mwhwdiilillavgsaagfiaglex;vgggtlivpwlwvldlqglaqhpyaqhlavgtsf 

10 20 30 40 .50 60 

70 80 90 100 110 .120 

o r f 1 7 - 1 . pep AVMVFT AFS SMLGQHKKQAVDWKTVFTMMPGMI FGVFTGALS AKYI PAFGLQI FFILFLT 
I I I I I I I I I I I I I I I I I I I I I I I I : I : I I I I I I I I M : I I II I I I I I I I I I I I I I I I I I I 
o r f 17 ng- 1 AVMVFTAFSSMLGQHKKQAVDWKT I FAMMPGMI FGVFAGALSAKYI PAFGLQI FFILFLT 

70 80 90 100 110 120 

130 . 140 150 160 170 180 

orf 17-1 . pep AVAFKTLHTDPQTASRPLPGLPGLTAVSTLFGTMSSWVGIGGGSLSVPFLIHCGFPAHKA 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I II I I I I I I I I I I I I I I I I I I I I I I I 
orfl7ng-l AVAFKTLHTGRQTASRPLPGLPGLTAVSTLFGAMSSWVGIGGGSLSVPFLIHCGFPAHKA 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 17-1 . pep IGTSSGLAWPIALSGAISYLLNGLNIAGLPEGSLGFLYLPAVAVLSAATIAFAPLGVKTA 
MM IIIIIIMMIMM 1:111111 II I Mill MM Mill I I I MM Mill MM 
orfl7ng-l IGTSSGIAWPIALSGAISYLVNGLNIAGLPEGSLGFLYLPAVAVLSAATIAFAPLGVKTA 

190 200 210 220 230 240 

250 260 269 

orf 17-1 . pep HKLSSAKLKKXFGIMLLLIAGKMLYNLLX 
MINIMI: I I I M I II I I I ! M II M 
orfl7ng-l HKLSSAKLKESFGIMLLLIAGKMLYNLLX 

250 260 

In addition, ORF17ng-l shows significant homology with a hypothetical HAnfluenzae protein: 

sp|P4 4070|Y902_HAEIN HYPOTHETICAL PROTEIN HI0902 pirl|G64015 hypothetical protein 
HI0902 - Haemophilus influenzae (strain Rd KW20) gi 1 1573922 (U32772) H. influenzae 
predicted coding region HI0902 [Haemophilus influenzae] Length = 264 

Score - 74 (34.9 bits), Expect = 1.6e-23, Sum P(2) - 1.6e-23 

Identities = 15/43 (34%), Positives = 23/43 (53%) 

Query: 55 AVGTS FAVMVFTAFSSMLGQHKKQAVDWKT I FAMMPGMI FGVF 97 

A+GTSFA +V T S HK + W+ + + P ++ VF 
Sbjct: 52 ALGT S FAT I VI TG IG S AQRHHKLGN I VWQAVR I LAP VI MLS VF 94 

Score - 195 (91.9 bits), Expect = 1.6e-23, Sum P(2) = 1.6e-23 
Identities = 44/114 (38%), Positives - 65/114 (57%) 

Query: 150 LFGAMSSWVGIGGGSLSVPFLIHCGFPAHKAIGTSSGLAWPIALSGAISYLVNGLNIAGL 209 

L G SS GIGGG VPFL G +AIG+S+ + +SG S++V+G + 

Sbjct: 148 LIGMASSAAGIGGGGFIVPFLTARGINIKQAIGSSAFCGMLLGISGMFSFIVSGWGNPLM 207 

Query: 210 PEG S LGFL YLPAVAVLS AAT I AFAPLGVKTAHKLS S AKLKE S FGIMLLLI AGKM 263 

PE SLG++YLPAV ++A + + LG KL + LK+ F + L+++A M 

Sbjct: 208 PEYSLGYIYLPAVLGITATSFFTSKLGASATAKLPVSTLKKGFALFLIWAINM 261 

This analysis, including the homology with the hypothetical HAnfluenzae transmembrane protein, 
suggests that the proteins from N.meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 12 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 95>: 

1 ..GGAAACGGAT GGCAGGCAGA CCCCGAACAT CCGCTGCTCG GGCTTTTTGC 

51 CGTCAGTAAT GTATCGATGA CGCTTGCTTT TGTCGGAATA TGTGCGTTGG 

101 TGCATTATTG CTTTTCGGGA ACGGTTCAAG TGTTTGTGTT TGCGGCACTG 

151 CTCAAACTTT ATGCGCTGAA GCCGGTTTAT TGGTTCGTGT TGCAGTTTGT 

201 GCTGATGGCG GTTGCCTATG TCCACCGCTG CGGTATAGAC CGGCAGCCGC 

251 CGTCAACGTT CGGCGGCTCG CAGCTGCGAC TCGGCGGGTT GACGGCAGCG 
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301 TTGATGCAGG TCTCGGTACT GGTGCTGCTG CTTTCAGAAA TTGGAAGATA 
351 A 

This corresponds to the amino acid sequence <SEQ ID 96; ORF18>: 

1 GNGWQADPEH PLLGLFAVSN VSMTLAFVGI CALVHYCFSG TVQVFVFAAL 
51 LKLYALKPVY WFVLQFVLMA VAYVHRCGID RQPPSTFGGS QLRLGGLTAA 
101 LMQVSVLVLL LSEIGR* 

Further work revealed the complete nucleotide sequence <SEQ ID 97>: 

1 ATGATTTTGC TGCATTTGGA TTTTTTGTCT GCCTTACTGT ATGCGGCGGT 

51 TTTTCTGTTT CTGATATTCC GCGCAGGAAT GTTGCAATGG TTTTGGGCGA 

101 GTATTATGCT GTGGCTGGGC ATATCGGTTT TGGGGGCAAA GCTGATGCCC 

151 GGCATATGGG GAATGACCCG CGCCGCGCCC TTGTTCATCC CCCATTTTTA 

201 CCTGACTTTG GGCAGCATAT TTTTTTTCAT CGGGCATTGG AACCGGAAAA 

251 CAGATGGAAA CGGATGGCAG GCAGACCCCG AACATCCGCT GCTCGGGCTT 

301 TTTGCCGTCA GTAATGTATC GATGACGCTT GCTTTTGTCG GAATATGTGC 

351 GTTGGTGCAT TATTGCTTTT CGGGAACGGT TCAAGTGTTT GTGTTTGCGG 

401 CACTGCTCAA ACTTTATGCG CTGAAGCCGG TTTATTGGTT CGTGTTGCAG 

451 TTTGTGCTGA TGGCGGTTGC CTATGTCCAC CGCTGCGGTA TAGACCGGCA 

501 GCCGCCGTCA ACGTTCGGCG GCTCGCAGCT GCGACTCGGC GGGTTGACGG 

551 CAGCGTTGAT GCAGGTCTCG GTACTGGTGC TGCTGCTTTC AGAAATTGGA 

601 AGATAA 

This corresponds to the amino acid sequence <SEQ ID 98; ORF18-l>: 

1 MTLLH LDFLS ALLYAAVFLF LIFRAGMLQW FWAS IMLWLG ISVLGAKLMP 

51 GIWGMTRAAP LFIPHFYLTL GSIFFFI GHW NRKTDGNGWQ ADPEHPLLGL 

10 1 FAVS NVSMTL AFVGICALV H Y CFSGTVQVF VFAALLKL YA L KPVYWFVLQ 

15 1 FVLMAVAYVH RCGIPRQPPS TFGGSQLRLG GLTAALMQVS VLVLLLSE IG j 

201 R* | 

Computer analysis of this amino acid sequence gave the following results: 1 
Homology with a predicted QRF from N.menin fritidis (strain A) 

ORF18 shows 98.3% identity over a 1 16aa overlap with an ORF (ORF18a) from strain A of N. 
meningitidis: 



or f 18. pep 
orf!8a 



10 20 30 

nNttWOAnPEHPLLGLF AVSNVSMTLAFVGI 

I i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

TRAAPLFI PHFYLTLGSIFFFI GHWNRKTDGNGWQADPEHPLLGLFAVSNVSMTLAFVGI 
60 80 90 100 110 



40 50 60 70 80 90 

rzvT.VHvrF^CTVOVFVFAALLKLYALK PVYWFVLQFVLMAVAYV HRCGIDRQPPSTFGGS 

TTTTi nil imn inn ii mini mm nil ii him mil 1 1 ii limn 

orflBa CALVHYCF SXTVQVFVFAALLKL YAL KPVYWFVLQFVLMAVAYV HRCGIDRQPPSTFGGS 
—120 130 140 150 160 170 



or f 18. pep 



100 110 
OT.RLG GLTAALMQVSVLVLLLS E IGRX 

I 1 1 1 1 I I 1 1 I I I 1 MlllliMMIl 

OLRL GGLTAALMQXSVLVLLLSE IGRX 
180 190 200 

The complete length ORF18a nucleotide sequence <SEQ ID 99> is: 

1 ATGATTTTGC TGCATTTGGA TTTTTTGTCT GCCTTACTGT ATGCGGCGGT 

51 TTTTCTGTTT CTGATATTCC GCGCAGGAAT GTTGCAATGG TTTTGGGCGA 

101 GTATTATGCT GTGGCTGGGC ATATCGGTTT TGGGGGCAAA GCTGATGCCC 

151 GGCATATGGG GAATGACCCG CGCCGCGCCC TTGTTCATCC CCCATTTTTA 

201 CCTGACTTTG GGCAGCATAT TTTTTTTCAT CGGGCATTGG AACCGGAAAA 

251 CGGATGGAAA CGGATGGCAG GCAGACCCCG AACATCCTCT GCTCGGGCTG 

301 TTTGCCGTCA GTAATGTATC GATGACGCTT GCTTTTGTCG GAATATGTGC 

351 GTTGGTGCAT TATTGCTTTT CGNGAACGGT TCAAGTGTTT GTGTTTGCGG 

401 CACTGCTCAA ACTTTATGCG CTGAAGCCGG TTTATTGGTT CGTGTTGCAG 



orfl8.pep 
orfl8a 
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451 TTTGTGCTGA TGGCGGTTGC CTATGTCCAC CGCTGCGGTA TAGACCGGCA 
501 GCCGCCGTCA ACGTTCGGCG GNTCGCAGCT GCGACTCGGC GGGTTGACGG 
551 CAGCGTTGAT GCAGNTCTCG GTACTGGTGC TGCTGCTTTC AGAAATTGGA 
601 AGATAA 

This encodes a protein having amino acid sequence <SEQ ID 100>: . 

1 MILLHLDFLS ALLYAAVFLF LIFRAGMLQW FWASIMLWLG ISVLGAKLMP 
51 GIWGMTRAA P LFIPHFYLTL GSIFFFIG HW NRKTDGNGWQ ADPEHPLLGL 
101 F AVSNVSMTL AFVGICALV H Y CFSXTVQVF VFAALLKL YA L KPVYWFVLQ 
151 FVLMAVAYV H RCGIDRQPPS TFGGSQLRLG GLTAALMQXS VLVLLLS EIG 
201 R* 

ORF18a and ORF18-1 show 99.0% identity in 201 aa overlap: 

10 20 30 40 50 60 

orf!8a.pep MILLHLDFLSALLYAAVFLFLIFRAGMLQWFWASIMLWLGISVLGAKLMPGIWGMTRAAP 

f 1 1 1 1 1 1 i 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 ] 

orfl8-l MILLHLDFLSALLYAAVFLFLIFRAGMLQWFWASIMLWLGISVLGAKLMPGIWGMTRAAP 

10 20 30 40 50 60 

70 80 90 100 110 120 

orfl8a.pep LFIPHFYLTLGSIFFFIGHWNRKTDGNGWQADPEHPLLGLFAVSNVSMTLAFVGICALVH 
MINIM Mi I Hit Mill MINIMI Ml! II I II II ill Ml I II I I MM INI 
orfl8-l LFIPHFYLTLGSIFFFIGHWNRKTDGNGWQADPEHPLLGLFAVSNVSMTLAFVGICALVH 

70 80 90 100 110 120 

130 140 150 160 170 180 

or f 1 8a . pep YCFSXTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGSQLRLG 
Nil N I II II II N II I i M I II I II I M I I M I M i I I M i M II I M I II i N I I I 
orfl8-l YCFSGTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGSQLRLG 

130 140 150 160 170 180 

190 200 
orfl8a.pep GLTAALMQXSVLVLLLSEIGRX 
I I M I I I I I I I II I I II II II 
orfl8-l GLTAALMQVSVLVLLLSEIGRX 

190 200 

Homology with a predicted ORF from N .gonorrhoeae 

ORF18 shows 93.1% identity over a 116aa overlap with a predicted ORF (ORF18.ng) from N. 
gonorrhoeae: 



orf 18. pep 
orf 18ng 
or f 18. pep 
orfl8ng 
orf 18. pep 
orfl8ng 



GNGWQADPEHPLLGLFAVSNVSMTLAFVGI 30 
I I I I I II I I M II II I I II I II II I II II I 
TRAAPLFI PHFYLTLGS I FFFIGYWNRKTDGNGWQADPEHPLLGLFAVSNVSMTLAFVGI 115 

CALVHYCFSGTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGS 90 
N I II I I II II II I II I II I I II I II II I II I I I I II I N I II I I II II I II I I I I I I I I 
CALVH YCFSGTVQV FVFAALLKLYALKPVYW FVLQFVLMAVAYVHRCG I DRQPPST FGGS 175 

QLRLGGLTAALMQVSVLVLLLSEIGR 116 
INN Mi II MM :: ICMII 
QLRLGVLAAMLMQVAVTAMLLAE IGR 201 



The complete length ORF18ng nucleotide sequence is <SEQ ID 101>: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 



ATGATTTTGC 
tttTctgTTT 
GTATTGCGTT' 
GGGATGTGGG 
CCTGACTTTG 
CAGATGGAAA 
TTTGCCGTCA 
GTTGGTGCAT 
CATTGCTCAA 
TTTGTATTGA 
GCCGCCGTCA 



TGCATTTGGA 
CTGATATTCC 
GTGGCTCGGC 
GAATGACCCG 
GGCAGCATAT 
CGGATGGCAG 
GTAATGTATC 
TATTGCTTTT 
ACTTTATGCG 
TGGCGGttgC 
ACGTTCGGCG 



TTTTTTGTCT 
GCGCAGGAAT 
ATCTCGGTTT 
CGCCGCGCCT 
TTTTTTTCAT 
GCAGACCCCG 
GATGACGCTT 
CGGGAACGGT 
CTGAAGCCGG 
CTATGTCCAC 
GTTCGCAGCT 



GCCTTACTGt 
GTTGCAATGG 
TAGGGGTAAA 
TTGTTCATCC 
CGGGTATTGG 
AACATCCGCT 
GCTTTTGTCG 
TCAAGTGTTT 
TTTATTGGTT 
CGCTGCGGTA 
GCGACTCGGC 



aTGCGGcggt 
TTTTGGGCGA 
GCTGATGCCG 
CCCATTTTTA 
AACCGGAAAA 
GCTCGGGCTT 
GAATATGTGC 
GTGTTTGCGG 
CGTGTTGCAG 
TAGACCGGCA 
GTGTTGGCGG 
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551 CGATGTTGAT GCAGGTTGCG GTAACGGCGA TGCTGCTTGC CGAAATCGGC 
601 AGATGA 

This encodes a protein having amino acid sequence <SEQ ID 102>: 

1 MTTiLHLDF LS ALLYAAVFLF LIFRAGMLQW FWASIALWLG ISVLGVKLMP 
51 GMWGMTRAAP LFIPHFYLTL GSIFFFI GYW NRKTDGNGWQ ADPEHPLLGL 
101 FAVSNVSM T L AFVGICALV H Y CFSGTVQVF VFAALLKLY A LKPVYWFVLQ 
15 1 FVLMAVAYV H RCGIDRQPPS TFGGSQLRLG VLAAMLMQVA VTAMLLAEIG 
201 R* 

This ORF18ng protein sequence shows 94.0% identity in 201 aa overlap with ORF18-1: 

10 20 30 40 50 60 

nrfl8-l Dep MILLHLDFLSALLYAAVFLFLI FRAGMLQWFWAS IMLWLGISVLGAKLMPGIWGMTRAAP 

nrflSna MI LLHLDFLS ALLYAAVFLFL I FRAGMLQW FWAS I ALWLG I SVLGVKLMPGMWGMTRAAP 

° r 9 10 20 30 40 50 60 

70 80 90 100 110 120 

nrf1 o_! D gn LFIPHFYLTLGSIFFFIGHWNRKTDGNGWQADPEHPLLGLFAVSNVSMTLAFVGICALVH 
orflb l.pep | 1 1 1 | | 1 1 1 1 | 1 1 1 1 | | | : 1 1 | 1 1 | M | 1 1 | 1 1 | | I I i I I I I I I I I It I I I I i I I M t I I 
nrfl o na LFIPHFYLTLGSIFFFIGYWNRKTDGNGWQADPEHPLLGLFAVSNVSMTLAFVGICALVH 
Orli0 9 70 80 90 100 HO 120 

130 140 150 160 170 180 

YCFSGTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGSQLRLG 

iimmiMiiiiiiiimiiiiiimmiiiiuimiiuimiiiiiiji 

nrflSna YC FSGTVQVFV FAALLKL YALKPVYW FVLQFVLMAVAYVHRCG IDRQP PST FGGS QLRLG 

y 130 140 150 160 170 180 



orf 18-1. pep 



190 200 

orf 18-1. pep GLTAALMQVSVLVLLLSEIGRX I 

|:| 1111:1 ::!t:lllll 
orfl8na VLAAMLMQVAVTAMLIAEIGRX 

190 200 

Based on this analysis, including the presence of several putative transmembrane domains in the 
gonococcal protein, it is predicted that the proteins from N.meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 13 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 103>: 

1 ATGAAAACCC CACTCCTCAA GCCTCTGCTN ATTACCTCGC TTCCCGTTTT 

51 CGCCAGTGTT TTTACCGCCG CCTCCATCGT CTGGCAGCTA GGCGAACCCA 

101 AGCTCGCCAT GCCCTTCGTA CTCGGCATCA TCGCCGGCGG CCTTGTCGAT 

151 TTGGACAACC NCNTGACCGG ACGGCTNAAA AACATCATCA CCACCGTCGC 

201 CCTGTTCACC CTCTCCTCGC TCACGGCACA AAGCACCCTC GGCACAGGGC 

251 TGCCCTTCAT CCTCGCCATG ACCCTGATGA CTT.CG.CTT CACCATTTTA 

301 GGCGCGGNCG ... 

This corresponds to the amino acid sequence <SEQ ID 104; ORF19>: 

1 MKTPLLKPLL ITSLPVFASV FTAASIVWQL GEPKLAMPFV LGIIAGGLVD 
51 LDNXXTGRLK NIITTVALFT LSSLTAQSTL GTGLPFILAM TLMTXXFTIL 
101 GAX. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 105>: 

1 ATGAAAACCC CACTCCTCAA GCCTCTGCTC ATTACCTCGC TTCCCGTTTT 

51 CGCCAGTGTT TTTACCGCCG CCTCCATCGT CTGGCAGCTA GGCGAACCCA 

101 AGCTCGCCAT GCCCTTCGTA CTCGGCATCA TCGCCGGCGG CCTTGTCGAT 

151 TTGGACAACC GCCTGACCGG ACGGCTGAAA AACATCATCA CCACCGTCGC 
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201 CCTGTTCACC CTCTCCTCGC TCACGGCACA AAGCACCCTC GGCACAGGGC 

251 TGCCCTTCAT CCTCGCCATG ACCCTGATGA CCTTCGGCTT CACCATTTTA 

301 GGCGCGGTCG GGCTCAAATA CCGCACCTTC GCCTTCGGTG CACTCGCCGT 

351 CGCCACCTAC ACCACACTTA CCTACACCCC CGAAACCTAC TGGCTGACCA 

401 ACCCCTTCAT GATTTTATGC GGCACCGTAC TGTACAGCAC CGCCATCCTC 

451 CTGTTCCAAA TCGTCCTGCC CCACCGCCCC GTCCAAGAAA GCGTCGCCAA 

501 CGCCTACGAC GCACTCGGCG GCTACCTCGA AGCCAAAGCC GACTTCTTCG 

551 ACCCCGATGA GGCAGCCTGG ATAGGCAACC GCCACATCGA CCTCGCCATG 

601 AGCAACACCG GCGTCATCAC CGCCTTCAAC CAATGCCGTT CCGCCCTGTT 

. 651 TTACCGCCTT CGCGGCAAAC ACCGCCACCC GCGCACCGCC AAAATGCTGC 

701 GTTACTACTT TGCCGCCCAA GACATACACG AACGCATCAG CTCCGCCCAC 

751 GTCGATTATC AGGAAATGTC CGAAAAATTC AAAAACACCG ACATCATCTT 

801 CCGCATCCAC CGCCTGCTCG AAATGCAGGG ACAAGCCTGC CGCAACACCG 

851 CCCAAGCCCT GCGCGCAAGC AAAGACTACG TTTACAGCAA ACGCCTCGGC 

901 CGCGCCATCG AAGGCTGCCG CCAATCGCTG CGCCTCCTTT CAGACAGCAA 

951 CGACAGTCCC GACATCCGCC ACCTGCGCCG CCTTCTCGAC AACCTCGGCA 

1001 GCGTCGACCA GCAGTTCCGC CAACTCCAGC ACAACGGCCT GCAGGCAGAA 

1051 AACGACCGCA TGGGCGACAC CCGCATCGCC GCCCTCGAAA CCAGCAGCCT 

1101 CAAAAACACC TGGCAGGCAA TCCGTCCGCA GCTAAACCTC GAATCAGGCG 

1151 TATTCCGCCA TGCCGTCCGC CTGTCCCTCG TCGTTGCCGC CGCCTGCACC 

1201 ATCGTCGAAG CCCTCAACCT CAACCTCGGC TACTGGATAC TACTGACCGC 

1251 CCTTTTCGTC TGCCAACCCA ACTACACCGC CACCAAAAGC CGCGTCCGCC 

1301 AGCGCATCGC CGGCACCGTA CTCGGCGTAA TCGTCGGCTC GCTCGTCCCC 

1351 TACTTCACCC CGTCTGTCGA AACCAAACTC TGGATTGTCA TCGCCAGTAC 

1401 CACCCTCTTT TTCATGACCC GCACCTACAA ATACAGTTTC TCCACCTTCT 

1451 TCATTACCAT TCAAGCCCTG ACCAGCCTCT CCCTCGCAGG TTTGGACGTA 

1501 TACGCCGCCA TGCCCGTACG CATCATCGAC ACCATTATCG GCGCATCCCT 

1551 TGCCTGGGCG GCAGTCAGCT ACCTGTGGCC AGACTGGAAA TACCTCACGC 

1601 TCGAACGCAC CGCCGCCCTT GCCGTATGCA GCAACGGTGC CTATCTCGAA 

1651 AAAATCACCG AACGCCTCAA AAGCGGCGAA ACCGGCGACG ACGTCGAATA 

1701 CCGCGCCACC CGCCGCCGCG CCCACGAACA CACCGCCGCC CTCAGCAGCA 

1751 CCCTTTCCGA CATGAGCAGC GAACCCGCAA AATTCGCCGA CAGCCTGCAA 

1801 CCCGGCTTTA CCCTGCTCAA AACCGGCTAC GCCCTGACCG GCTACATCTC 

1851 CGCCCTCGGC GCATACCGCA GCGAAATGCA CGAAGAATGC AGCCCCGACT 

1901 TTACCGCACA GTTCCACCTC GCCGCCGAAC ACACCGCCCA CATCTTCCAA 

1951 CACCTGCCCG AAACCGAACC CGACGACTTT CAGACAGCAC TGGATACACT 

2001 GCGCGGCGAA CTCGACACCC TCCGCACCCA CAGCAGCGGA ACACAAAGCC 

2051 ACATCCTCCT CCAACAGCTC CAACTCATCG CCCGACAGCT CGAACCCTAC 

2101 TACCGCGCCT ACCGCCAAAT TCCGCACAGG CAGCCCCAAA ATGCAGCCTG 

2151 A 

This corresponds to the amino acid sequence <SEQ ID 106; ORF19-l>: 

1 MKTPLLKPLL ITSLPVFASV FT AASIVWQL GEP KLAMPFV LGIIAGGLVD 

51 LDNRLTGRLK NIITTVALFT LSSLTAQSTL GTGLP FILAM TLMTFGFTIL 

101 GAVGLKYRTF AFGALAVATY TTLTYTPETY WLTNP FMILC GTVLYSTAIL 

151 LFQIVLPHRP VQESVANAYD ALGGYLEAKA DFFDPDEAAW IGNRHIDLAM 

201 SNTGVITAFN QCRSALFYRL RGKHRHPRTA KMLRYYFAAQ DIHERISSAH 

251 VDYQEMSEKF KNTDIIFRIH RLLEMQGQAC RNTAQALRAS KDYVYSKRLG 

301 RAIEGCRQSL RLLSDSNDSP DIRHLRRLLD NLGSVDQQFR QLQHNGLQAE 

351 NDRMGDTRIA ALETSSLKNT WQAIRPQLNL ESGVFRHAVR LSLWAAACT 

401 IVEALNLN LG YWILLTALFV CQPNYTATKS RVRQ RIAGTV LGVIVGSLVP 

451 YFTPSVETKL WIVIASTTLF FMTRTYKYSF STFFITIQAL TSLSLAGLDV 

501 YAAMPVRIID TIIGASLAWA AVSYLWPDWK YLTLERTAAL AVCSNGAYLE 

551 KITERLKSGE TGDDVEYRAT RRRAHEHTAA LSSTLSDMSS EPAKFADSLQ 

601 PGFTLLKTGY ALTGYISALG AYRSEMHEEC SPDFTAQFHL AAEHTAHI FQ 

651 HLPETEPDDF QTALDTLRGE LDTLRTHSSG TQSHILLQQL QLIARQLEPY 

701 YRAYRQIPHR QPQNAA* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with predicted transmenbrane protein YHFK of H. influenzae (accession number P44289) 
ORF19 and YHFK proteins show 45% aa identity in 97 aa overlap: 

orfl9 6 LKPLLITSLPVFAS VFTAAS I VWQLGE PKLAMPFVLGI I AGGLVDLDNXXTGRLKNI ITT 65 

L +I-H-+PVF +V AA +W +MP +LGIIAGGLVDLDN TGRLKN+ T 

YHFK 5 LNAKVISTIPVFIAVNIAAVGIWFFDISSQSMPLILGIIAGGLVDLDNRLTGRLKNVFFT 64 
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orfl9 66 VALFTLSSLTAQSTLGTGLPFILAMTLMTXXFTILGA 102 

+ F++SS Q +G + +1+ MT++T FT++GA 
YHFK 65 LIAFSISSFIVQLHIGKPIQYIVLMTVLTFIFTMIGA 101 



Homology with a predicted QRF from ^meningitidis (strain A) 

ORF19 shows 92.2% identity over a 102aa overlap with an ORF (ORF19a) from strain A of K 



meningitidis: 



orfl9.pep 
orfl9a 

or f 19. pep 
orfl9a 

orfl9a 



!0 20 30 40 50 60 

HCTPT I iKPT J TT gT - pvp3 ^yZ*!! RftS I VWOLGE P KLAMP FVLGI I AGGLVDL DNXXTGRLK 
I I | | II | N III! M M Mi M I I I I I I I I I I I II I I I H I II M U IN M IMN 
MKTPPLKPLLIT SLPVFASVFTA ASIVWQLGEPK LAMPFVLGIIAGGLVDL DNRLTGRLK 

—10 20 30 40 50 60 

70 80 90 100 

^TTrrw&T.PTT.qc;T,TAOSTLGTGLP FILAMTLMTXXFTILGAX 
I | | ! | | | | | | | 1 I 1 r I I 1 I I I 1 1 I i I I 1 I I I 1 I I I ! I • I I 

NTT n^waT.TrTT.gRT.VAQSTLGTGLP FILAMTLMTFGFTIMGAVG LKYRTFAFGALAVATY 
70 80 90 100 110 120 

ttt .TV T PF.T YWLTN P FMI LCGT VLY ST AI I LF QI I LPHRPVQEN VANAYEALG S YLEAKA 
130 140 150~ 160 170 180 



The complete length ORF19a nucleotide sequence <SEQ ID 107> is: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 



ATGAAAACCC 
CGCCAGTGTC 
AGCTCGCCAT 
TTGGACAACC 
CCTGTTCACC 
TGCCATTCAT 
GGCGCGGTCG 
CGCCACCTAC 
ACCCCTTTAT 
CTGTTCCAAA 
CGCCTACGAA 
ATCCCGACGA 
AGCAACACCG 
TTACCGCCTT 
GCTACTACTT 
GTCGACTACC 
CCGCATCCAC 
CCCAAGCCCT 
CGCGCCATCG 
CGACAATCCC 
GCGTCGACCA 
AACGACCGCA 
CAAAAACACC 
TATTCCGCCA 
ATCGTCGAAG 
CCTTTTCGTC 
AGCGCATCGC 
TACTTTACCC 
CACCCTCTTT 
TCATCACCAT 
TACGCCGCCA 
TGCCTGGGCG 
TCGAACGCAC 
AAAATCACCG 
CCGCGCCACC 
CCCTTTCCGA 
CCCGGCTTTA 
CGCCCTCGGC 
TTACCGCACA 
CACCTGCCCG 
GCGCGGCGAA 
ACATCCTCCT 
TACCGCGCCT 
A 



CACCCCTCAA 
TTTACCGCCG 
GCCCTTCGTA 
GCCTGACCGG 
CTCTCCTCAC 
CCTCGCCATG 
GGCTGAAATA 
ACCACACTTA 
GATTCTGTGC 
TCATCCTGCC 
GCACTCGGCA 
AGCCGAATGG 
GCGTCATCAC 
CGCGGCAAAC 
CGCCGCCCAA 
AAGAGATGTC 
CGCCTGCTCG 
GCGCGCAAGC 
AAGGCTGCCG 
GACATCCGCC 
GCAGTTCCGC 
TGGGCGACAC 
TGGCAGGCAA 
TGCCGTCCGC 
CCCTCAACCT 
TGCCAACCCA 
CGGCACCGTA 
CCTCCGTCGA 
TTCATGACCC 
TCAAGCCCTG 
TGCCCGTACG 
GCAGTCAGCT 
CGCCGCCCTT 
AACGCCTCAA 
CGCCGCCGCG 
CATGAGCAGC 
CCCTGCTCAA 
GCATACCGCA 
GTTCCACCTC 
AAACCGAACC 
CTCGACACCC 
CCAACAGCTC 
ACCGACAAAT 



GCCTCTGCTC 
CCTCCATCGT 
CTCGGCATCA 
ACGGCTGAAA 
TTGTCGCGCA 
ACCCTGATGA 
CCGCACCTTC 
CCTACACCCC 
GGAACCGTAC 
CCACCGCCCC 
GCTACCTCGA 
ATAGGCAACC 
CGCCTTCAAC 
ACCGCCACCC 
GACATACACG 
CGAAAAATTC 
AAATGCAGGG 
AAAGACTACG 
CCAATCGCTG 
ACCTGCGCCG 
CAACTCCAGC 
CCGCATCGCC 
TCCGTCCGCA 
CTGTCCCTTG 
CAACCTCGGC 
ACTACACCGC 
CTCGGCGTAA 
AACCAAACTC 
GCACCTACAA 
ACCAGCCTCT 
CATCATCGAC 
ACCTGTGGCC 
GCCGTATGCA 
AAGCGGCGAA 
CCCACGAACA 
GAACCCGCAA 
AACCGGCTAC 
GCGAAATGCA 
GCCGCCGAAC 
CGACGACTTT 
TCCGCACCCA 
CAACTCATCG 
TCCGCACAGG 



ATTACCTCGC 
CTGGCAGCTG 
TCGCTGGCGG 
AACATCATCG 
AAGCACCCTC 
CTTTCGGCTT 
GCCTTCGGCG 
CGAAACCTAC 
TGTACAGCAC 
GTTCAAGAAA 
AGCCAAAGCC 
GCCACATCGA 
CAATGCCGTT 
GCGCACCGCC 
AACGCATCAG 
AAAAACACCG 
ACAAGCCTGC 
TTTACAGCAA 
CGCCTCCTTT 
CCTTCTCGAC 
ACAACGGCCT 
GCCCTCGAAA 
GCTAAACCTC 
TCGTTGCCGC 
TACTGGATAC 
CACCAAAAGC 
TCGTCGGCTC 
TGGATCGTCA 
ATACAGCTTC 
CCCTCGCAGG 
ACCATTATCG 
AGACTGGAAA 
GCAACGGCGC 
ACCGGCGACG 
CACCGCCGCC 
AATTCGCCGA 
GCCCTGACCG 
CGAAGAATGC 
ACACCGCCCA 
CAGACAGCAC 
CAGCAGCGGA 
CCCGGCAGCT 
CAGCCCCAAA 



TTCCCGTTTT 
GGCGAACCCA 
CCTGGTCGAT 
CCACCGTCGC 
GGCACAGGTT 
TACCATCATG 
CACTCGCCGT 
TGGCTGACCA 
CGCCATCATC 
ACGTCGCCAA 
GACTTTTTCG 
CCTCGCCATG 
CCGCCCTGTT 
AAAATGCTGC 
CTCCGCCCAC 
ACATCATCTT 
CGCAACACCG 
ACGCCTCGGC 
CAGACAGCAA 
AACCTCGGCA 
GCAGGCAGAA 
CCGGCAGCCT 
GAATCAGGCG 
CGCCTGCACC 
TACTGACCGC 
CGCGTCCGCC 
GCTCGTCCCC 
TCGCCAGTAC 
TCGACATTTT 
GTTGGACGTA 
GCGCATCCCT 
TACCTCACGC 
CTATCTCGAA 
ACGTCGAATA 
CTCAGCAGCA 
CAGCCTGCAA 
GCTACATCTC 
AGCCCCGACT 
CATCTTCCAA 
TGGATACACT 
ACACAAAGCC 
CGAACCCTAC 
ACGCAGCCTG 
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This encodes a protein having amino acid sequence <SEQ ID 108>: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 



MKTPPLKPLL ITSLPVFASV 



LDNRLTGRLK 
GAVGLKYRTF 
LFQIILPHRP 
SNTGVITAFN 
VDYQEMSEKF 
RAIEGCRQSL 
NDRMGDTRIA 
IVEALNLN LG 
YFTPSVETKL 
YAAMPVRIID 
KITERLKSGE 
PGFTLLKTGY 
HLPETEPDDF 
YRAYRQIPHR 



NIIATVALFT 
AFGALAVATY 
VQENVANAYE 
QCRSALFYRL 
KNTDIIFRIH 
RLLSDSNDNP 
ALETGSLKNT 
YWILLTALFV 
WIVIASTTLF 
TIIGASLAWA 
TGDDVEYRAT 
ALTGYISALG 
QTALDTLRGE 
QPQNAA* 



FTAASIVWQL 
LSSLVAQSTL 
TTLTYTPETY 
ALGSYLEAKA 
RGKHRHPRTA 
RLLEMQGQAC 
DIRHLRRLLD 
WQAIRPQLNL 
CQPNYTATKS 
FMTRTYKYSF 
AVSYLWPDWK 
RRRAHEHTAA 
AYRSEMHEEC 
LDTLRTHSSG 



GEP KLAMPFV LGI IAGGLVD 
GTGLPF ILAM TLMTFGFTIM 
WLTNPFMILC GTVLYSTAII 



DFFDPDEAEW 
KMLRYYFAAQ 
RNTAQALRAS 
NLGSVDQQFR 
ESGVFRHAVR 
RVRQRIAGTV 



IGNRHIDLAM 
DIHERISSAH 
KDYVYSKRLG 
QLQHNGLQAE 
LSLWAAACT 
LGVIVGSLVP 



STFFITIQAL 
YLTLERTAAL 
LSSTLSDMSS 
SPDFTAQFHL 
TQSHILLQQL 



TSLSLAGLDV 
AVCSNGAYLE 
EPAKFADSLQ 
AAEHTAHIFQ 
QLIARQLEPY 



ORF19a and ORF19-1 show 98.3% identity in 716 aa overlap: 



10 20 30 40 50 60 

orf 19a .pep ' MKTPPLKPLLITSLPVFASVFTAAS IVWQLGEPKLAMPFVLG I IAGGLVD LDNRLTGRLK 
I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I t I I I I 11 I I I I I I II I 
orf 19-1 MKTPLLKPLLITSLPVFASVFTAAS IVWQLGEPKLAMPFVLG 1 1 AGGLVDLDNRLTGRLK 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 19a . pep N 1 1 AT VAL FTL S S LVAQST LGTGLP FI LAMT LMT FGFT IMGAVGLKYRTFAFGALAVAT Y 

I I I : I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I 1 I I I : I I I I I I I I I I I I I I I I I I I I 
orf 19-1 NIITTVALFTLSSLTAQSTLGTGLPFILAMTLMTFGFTILGAVGLKYRTFAFGALAVATY 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 19a . pep TTLTYTPETYWLTNPFMILCGTVLYSTAIILFQIILPHRPVQENVANAYEALGSYLEAKA 
II I I I II I I I I II I I I I I I I II I I I II I I : I II 1 : I I I II I I I : II I I I : t I I : I 1 It II 
orf 19-1 TTLTYTPETYWLTNPFMILCGTVLYSTAILLFQIVLPHRPVQESVANAYDALGGYLEAKA 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 1 9a . pep DFFDPDEAEWIGNRHIDLAMSNTGVITAFNQCRSALFYRLRGKHRHPRTAKMLRYYFAAQ 
I I | I I I I I I II I I I II II I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II I II I I I I 
orf 19-1 DFFDPDEAAWIGNRHIDLAMSNTGVITAFNQCRSALFYRLRGKHRHPRTAKMLRYYFAAQ 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 19a. pep DIHERISSAHVDYQEMSEKFKNTDIIFRIHRLLEMQGQACRNTAQALRASKDYVYSKRLG 

mililMMIIIIIIIillllllMilMMIIIIillllllllllilllllllllll 

orf 19-1 DIHERISSAHVDYQEMSEKFKNTDIIFRIHRLLEMQGQACRNTAQALRASKDYVYSKRLG 

250 260 270 280 290 300 



310 320 330 340 350 360 

or f 1 9a . pep RAIEGCRQSLRLLSDSNDNPDIRHLRRLLDNLGSVDQQFRQLQHNGLQAENDRMGDTRIA 
I I I II I I I 1 1 II I I 1 1 I I : I I I 1 1 1 I II I I I I I I I I I I I I I I I II II I I I I I II I II I I I 
orf 19-1 RAIEGCRQSLRLLSDSNDSPDIRHLRRLLDNLGSVDQQFRQLQHNGLQAENDRMGDTRIA 

310 320 330 340 350 360 



370 380 390 400 410 420 

orf 1 9a . pep ALETGSLKNTWQAIRPQLNLESGVFRHAVRLSLWAAACTIVEALNLNLGYWILLTALEV 
I I I I : II I I I I I I I I II I I I I III I I I II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 
orf 19-1 ALET S S LKNTWQAI R PQLNLE SGVFRHAVRLS LWAAACT I VEALN LNLGYWI LLT AL FV 

370 380 390 400 410 420 

430 440 450 460 470 480 

or f 1 9a . pep CQPNYTATKSRVRQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIASTTLFFMTRTYKYSF 
I I II I II I I I I I I I J I II ! I I I I I I I I I I I I II I I I I I I I I I I I I I I I I III I I I II II I 
orf 19-1 CQPNYTATKSRVRQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIASTTLFFMTRTYKYSF 

430 440 450 460 470 480 



490 500 510 520 530 540 

orf 19a. pep STFFIT IQALTSLSLAGLDVYAAMPVRI I DTI IGASLAWAAVS YLW PDWKYLTLERTAAL 

I I I I I M II I II I I I I I I I I I I II I I I I I I I I I I I I I I I M I I I I I II II I I I | II | | I | 
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orf 19-1 ST FF I T I QALT S L S LAG LD V YAAMPVRI I DT 1 1 GAS LAWAAVS YLW PDWKYLTLERT AAL 

490 500 510 520 530 540 

550 560 570 580 590 600 

orfl9a.pep AVCSNGAYLEKITERLKSGETGDDVEYRATRRRAHEHTAALSSTLSDMSSEPAKFADSLQ 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 E 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 

orf!9-l AVCSNGAYLEKITERLKSGETGDDVEYRATRRRAHEHTAALSSTLSDMSSEPAKFADSLQ 

550 560 570 580 590 600 

610 620 630 640 650 660 

orf 19a . pep PGFTLLKTGYALTGYI SALGAYRSEMHEECSPDFTAQFHLAAEHTAHI FQHLPETEPDDF 

I ! I I I I I I I I M I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I t I I I I M I I I I I I I I I I 
or f 1 9 - 1 PGFTLLKTGYALTGYI SALGAYRSEMHEECS PDFTAQFHLAAEHTAHI FQHLPETEPDDF 

610 620 630 640 . 650 660 

670 680 690 700 710 

orf 19a . pep QTALDTLRGELDTLRTHSSGTQSHILLQQLQLIARQLEPYYRAYRQIBHRQPQNAAX 
I I I I I I I I I I I i I I I I I M I I \ I I I I I I! II I I I I I I I M I I I I I I I I I I I I I 1 I I I 
orf 19-1 QTALDTLRGELDTLRTHSSGTQSHILLQQLQLIARQLEPYYRAYRQIPHRQPQNAAX 

670 680 690 700 710 

Homology with a predicted ORF from K gonorrhoeae 

ORF19 shows 95.1% identity over a 102aa overlap with a predicted ORF (ORF19.ng) from N. 
gonorrhoeae: ( 

orf 19 . pep MKTPLLKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNXXTGRLK 60 

II I I II I I I I I I I I M I I II M I I I ! II I I I I I I M I I I I I I I I I I I I I I I I I MIM 
orfl9ng MKTPLLKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNRLTGRLK 60 

orf 19. pep NIITTVALFTLSSLTAQSTLGTGLPFILAMTLMTXXFTILGAX 103 J 

I I I : I II I I I I I 1 II I I II I I I I I 1 I I I I I I I I I MUM 
orfl9ng NI IATVALFTLSSLTAQSTLGTGLPFI LAMTLMTFGFT ILGAVGLKYRTFAFGALAVATY 120 1 

An ORF19ng nucleotide sequence <SEQ ID 109> is predicted to encode a protein having amino 
acid sequence <SEQ ID 1 10>: 

1 MKTPLLKPLL ITSLPVFASV FTAASIVWQL GEPKLAMPFV LGI IAGGLVD 

51 LDNRLTGRLK NIIATV ALFT LSSLTAQSTL GTGLPFILAM TLMTFGFTIL 

101 GAVGLKYRTF AFGALAVAT Y TTLTYTPETY WLTNPF MILC GTVLYSTAII 

151 LFQIILPHRP VQESVAN AYE ALGGYLEAKA DFFDP DEAAW IGNRHIDLAM 

201 SNTGVITAFN QCRSALFYRL RGKHRHPRTA KMLRYYFAAQ DIHERISSAH 

251 VDYQEMSEKF KNTDIIFRIR RLLEMQGQAC RNTAQAIRSG KDYVYSKRLG 

301 RAIEGCRQSL RLLSDGNDSP DIRHLSRLLD NLGSVDQQFR QLRHSDSPAE 

351 NDRMGDTRIA ALETGSFKNT * 

Further work revealed the complete nucleotide sequence <SEQ ID 1 1 1>: 

1 ATGAAAACCC CACTCCTCAA GCCTCTGCTC ATTACCTCGC TTCCCGTTTT 

51 CGCCAGTGTC TTTAGCGCCG CCTCCATCGT CTGGCAGCTA GGCGAACCCA 

101 AGCTCGCCAT GCCCTTCGTA CTCGGCATCA TCGCCGGCGG CCTGGTCGAT 

151 TTGGACAACC GCCTGACCGG ACGGCTGAAA AACATCATCG CCACCGTCGC 

201 CCTGTTTACC CTCTCCTCGC TCACGGCGCA AAGCACCCTC GGCACAGGGC 

251 TGCCCTTCAT CCTCGCCATG ACCCTGATGA CCTTCGGCTT TACCATTTTA 

301 GGCGCGGTCG GGCTGAAATA CCGCACCTTC GCCTTCGGCG CACTCGCCGT 

351 CGCCACCTAC ACCACGCTTA CCTACACCCC CGAAACCTAC TGGCTGACCA 

401 ACCCCTTCAT GATTTTATGC GGCACCGTAC TGTACAGCAC CGCCATCATC 

451 CTGTTCCAAA TCATCCTGCC CCACCGCCCC GTCCAAGAAA GCGTCGCCAA 

501 TGCCTACGAA GCACTCGGCG GCTACCTCGA AGCCAAAGCC GACTTCTTCG 

551 ACCCCGATGA GGCAGCCTGG ATAGGCAACC GCCACATCGA CCTCGCCATG 

601 AGCAACACCG GCGTCATCAC CGCCTTCAAC CAATGCCGTT CCGCCCTGTT 

651 TTACCGTTTG CGCGGCAAAC ACCGCCACCC GCGCACCGCC AAAATGCTGC 

701 GCTACTACTT CGCCGCCCAA GACATCCACG AACGCATCAG CTCCGCCCAC 

751 GTCGACTACC AAGAGATGTC CGAAAAATTC AAAAACACCG ACATCATCTT 

801 CCGCATCCGC CGCCTGCTCG AAATGCAGGG GCAGGCGTGC CGCAACACCG 

851 CCCAAGCCAT CCGGTCGGGC AAAGACTAcg tTTACAGCAA ACGCCTCGGA 

901 CGCGCCATcg aaggctgCCG CCAGTCGCtg cgcctCCTTt cagacggcaA 

951 CGACAGTCCC GACATCCGCC ACCTGAGccg CCTTCTCGAC AACCTCGgca 
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1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 



GCGTcgacca 
Aacgaccgca 
caaaaaCAcc 
TATTCCGCCA 
ATCGTCgaag 
CCTTTTCGTC 
AACGCATCGC 
TACTTCACCC 
CACCCTGTTC 
TCATCACCAT 
TACGCCGCCA 
TGCCTGGGCG 
TCGAACGCAC 
AAAATTGCCG 
CCGCATCACC 
CCCTTTCCGA 
CCCGGCTTTA 
CGCCCTCGGC 
TTACCGCACA 
CACCTGCCCG 
GCGCGGCGAA 
ACATCCTCCT 
TACCGCGCCT 
A 



gcagtTCcgc 
tgggcgacaC 
tggcaggCAA 
TGCCGTCCGC 
cCCTCAACCT 
TGCCAACCCA 
CGGCACCGTA 
CCTCCGTCGA 
TTCATGACCC 
TCAGGCACTG 
TGCCCGTGCG 
GCGGTCAGCT 
CGCCGCCCTT 
AACGCCTCAA 
CGCCGCCGCG 
CATGAGCAGC 
CCCTGCTCAA 
GCATACCGCA 
GTTCCACCTT 
ACATGGGACC 
CTCGGCACCC 
CCAACAGCTC 
ACCGACAAAT 



caactCCGAC 
CCGCATCGCC 
TCCGTCCGCa 
CTGTCCCTCG 
CAACCTCGGC 
ACTACACCGC 
CTCGGCGTAA 
AACCAAACTC 
GCACCTACAA 
ACCAGCCTCT 
CATCATcgaC 
ACCTGTGGCC 
GCCGTATGCA 
AACCGGCGAA 
CCCACGAACA 
GAACCCGCAA 
AACCGGCTAC 
GCGAAATGCA 
GCCGCCGAAC 
CGACGACTTT 
TCCGCACCCG 
CAACTCATCG 
TCCGCACAGG 



ACAgcgactC 
GCCCtcgaaa 
gctgaaCCTC 
TCGTTGCCGC 
TACTGGATAC 
CACCAAAAGC 
TCGTCGGCTC 
TGGATTGTCA 
ATACAGTTTC 
CCCTCGCAGG 
ACCATTATCG 
AGACTGGAAA 
GCAGCGGCAC 
ACCGGCGACG 
CACCGCCGCC 
AATTCGCCGA 
GCCCTGACCG 
CGAAGAATGC 
ACACCGCCCA 
CAGACGGCAT 
CAGCAGCGGA 
CccgGCAACT 
CAGCCCCAAA 



CCCCGCcgaa 
ccggcagctT 
GAATCatgCG 
CGCCTGCACC 
TGCTGACCGC 
CGCGTGTACC 
GCTCGTCCCC 
TCGCCGGTAC 
TCCACCTTCT 
TTTGGACGTA 
GCGCATCCCT 
TACCTCACGC 
ATACCTCCAA 
ACATAGAATA 
CTCAGCAGCA 
CAGCCTGCAA 
GCTACATCTC 
AGCCCCGACT 
CATCTTCCAA 
TGGATACACT 
ACACAAAGCC 
CGAACCCTAC 
ACGCAGCCTG 



This corresponds to the amino acid sequence <SEQ ID 1 12; ORF19ng-l>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 



MKTPLLKPLL ITSLPVFASV 



LDNRLTGRLK 
GAVGLKYRTF 
LFQIILPHRP 
SNTGVITAFN 
VDYQEMSEKF 
RAIEGCRQSL 
NDRMGDTRIA 
IVEALNLN LG 
YFTPSVETKL 
YAAMPVRIID 
KIAERLKTGE 
PGFTLLKTGY 
HLPDMGPDDF 
YRAYRQIPHR 



NIIATVALFT 
AFGALAVATY 
VQESVANAYE 
QCRSALFYRL 
KNTDIIFRIR 
RLLSDGNDSP 
ALETGSFKNT 
YWILLTALFV 
WIVIAGTTLF 
TIIGASLAWA 
TGDDIEYRIT 
ALTGYISALG 
QTALDTLRGE 
QPQNAA* 



FTAASIVWQL 
LSSLTAQSTL 
TTLTYTPETY 
ALGGYLEAKA 
RGKHRHPRTA 
RLLEMQGQAC 
DIRHLSRLLD 
WQAIRPQLNL 
CQPNYTATKS 
FMTRTYKYSF 
AVSYLWPDWK 
RRRAHEHTAA 
AYRSEMHEEC 
LGTLRTRSSG 



GEP KLAMPFV LGIIAGGLVD 
GTGLP FILAM TLMTFGFTIL 
WLTNPFMILC GTVLYSTAII 



DFFDPDEAAW 
KMLRYYFAAQ 
RNTAQAIRSG 
NLGSVDQQFR 
ESCVFRHAVR 
RVYQRIAGTV 



IGNRHIDLAM 
DIRERISSAH 
KDYVYSKRLG 
QLRHSDSPAE 
LSLWAAACT 
LGVIVGSLVP 



STFFITIQAL 
YLTLERTAAL 
LSSTLSDMSS 
SPDFTAQFHL 
TQSHILLQQL 



TSLSLAGLDV 
AVCSSGTYLQ 
EPAKFADSLQ 
AAEHTAHIFQ 
QLIARQLEPY 



ORF19ng-l and ORF19-1 show 95.5% identity in 716 aa overlap: 



10 20 30 40 50 60 

or f 19-1 . pep MKTPLLKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNRLTGRLK 
MINI II IMIillll Hill III lllllll Mil II Mill I I I II MM II III I I I 
orfl9ng-l MKTPLLKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNRLTGRLK 

10 20 30 40 50 60 



70 80 90 100 110 120 

or f 19-1. pep NIITWALFTLSSLTAQSTLGTGLPFIIAMTLMTFGFTILGAVGLKYRTFAFGALAVATY 
MM MM MMMMMIMM III II Mill M II Ml MM I M II IIIIIM II II 
orfl9ng-l NIIATVALFTLSSLTAQSTLGTGLPFILAMTLMTFGFTILGAVGLKYRTFAFGALAVATY 

70 80 90 100 110 120 

130 140 150 160 170 180 

or f 19-1 . pep TTLTYTPETYWLTNPFMILCGTVLYSTAILLFQIVLPHRPVQESVANAYDALGGYLEAKA 
M M M M I! I M I I | H M I ; I I H I I I : M I I : I ! I I I I ( I H I I H : I I I I II I I I i 
orfl9ng-l TTLTYTPETYWLTNPFMILCGTVLYSTAIILFQIILPE1RPVQESVANAYEALGGYLEAKA 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 1 9-1 . pep DFFDPDEAAWIGNRHIDLAMSNTGVITAFNQCRSALFYRLRGKHRHPRTAKMLRYYFAAQ 
II I I II I I I I I I Mill II I I II II M I III M M I I II I M II II M I MMMIM II 
orfl9ng-l DFFDPDEAAWIGNRHIDLAMSNTGVITAFNQCRSALFYRLRGKHRHPRTAKMLRYYFAAQ 

190 200 210 220 230 240 



orf 19-1. pep 



250 260 270 280 290 300 

DIHERISSAHVDYQEMSEKFKNTDIIFRIHRLLEMQGQACRNTAQALRASKDYVYSKRLG 
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IIMMIIINIMIIllllllll IMIIrllMII IMIIIIMI:|::||IMIIIII 
orfl9ng-l DIHERI SSAHVDYQEMSEKFKNTDI IFRIRRLLEMQGQACRNTAQAIRSGKDYVYSKRLG 

250 260 270 280 290 300 

310 320 330 340 350 360 

or f 19-1 . pep RAIEGCRQSLRLLSDSNDSPDIRHLRRLLDNLGSVDQQFRQLQHNGLQAENDRMGDTRIA 

I 1 M It I 1 I M M I I : I M I f M I i I I I I I I I I I I I I I II I : I : MINIMUM 
or f 1 9ng- 1 RAIEGCRQSLRLLSDGNDS PDIRHLSRLLDNLGS VDQQFRQLRHSDSPAENDRMGDTRIA 

310 320 330 340 350 360 

370 380 390 400 410 420 

or f 1 9-1 . pep ALETSSLKNTWQAIRPQLNLESGVFRHAVRLSLWAAACTIVEALNLNLGYWILLTALFV 

II M : I : M I I! II I I I II II I I I I I 1 I i I I t I I t I t 1 I I I t I I 1 I I 1 J | I f I I 1 I | I I 
orfl9ng-l ALETGSFKNTWQAIRPQLNLESCVFRHAVRLSLWAAACTIVEALNLNLGYWILLTALFV 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf 19-1 . pep CQPNYTATKSRVRQRIAGTVLGVTVGSLVPYFTPSVETKLWIVIASTTLFFMTRTYKYSF 

illinium 1 1 1 1 n 1 1 1 1 1 n i in 1 1 1 1 1 m 1 1 1 1 1 1 1 : 1 1 1 m 1 1 1 1 1 m 

orfl9ng-l CQPNYTATKSRVYQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIAGTTLFHKTRTYKYSF 

430 440 450 460 470 480 

490 500 510 520 530 540 

orf 1 9-1 . pep ST FFIT IQALTSLSLAGLD VYAAMPVRI I DT I IGASLAWAAVS YLWPDWKYLTLERTAAL 

mill iiiii iimiiiiimmimimmiimiiimi 

orfl9ng-l STFFITIQALTSLSLAGLDVYAAMPVRIIDTIIGASLAWAAVSYLWPDWKYLTLERTAAL 

490 500 510 520 530 ■ 540 

550 560 570 580 590 600 

orf 1 9-1 . pep AVCSNGAYLEKITERLKSGETGDDVEYRATRRRAHEHTAALSSTLSDMSSEPAKFADSLQ 
llll:|:ll:ll:llll:llllll:lll 1111111111111111111111111111111 
orf 1 9ng-l AVCSSGTYLQKIAERLKTGETGDDIEYRITRRRAHEHTAALSSTLSDMSSEPAKFADSLQ j 

550 560 570 ,580 590 600 | 

610 620 630 640 650 660 ' 

orf 19-1 . pep PGFTLLKTGYALTGYI SALGAYRSEMHEECS PDFTAQFHLAAEHTAHI FQHLPETEPDDF 
I I I I III III Ml Ml Ml II! Ml I | | I Ml I Mi Ml I I Ml I III I III I : I I I I 
orfl9ng-l PGFTLLKTGYALTGYI SALGAYRSEMHEECS PDFTAQFHLAAEHTAHI FQHLPDMGPDDF 

610 620 630 640 650 660 

670 680 690 700 710 

orf 19-1. pep QTALDTLRGELDTLRTHSSGTQSHILLQQLQLIARQLEPYYRAYRQIPHRQPQNAAX 
lEIMIIMM III I : III I I M Ml I I I III I Ml III I I I i I Ml III I I I I I I 
orfl9ng-l QTALDTLRGELGTLRTRSSGTQSHILLQQLQLIARQLEPYYRAYRQIPHRQPQNAAX 

670 680 690 700 710 

In addition, ORF19ng-l shows significant homology to a hypothetical gonococcal protein 
previously entered in the databases: 

sp|033369|YOR2_NEIGO HYPOTHETICAL 45.5 KD PROTEIN (ORF2) gnl | PIDj ell54438 
(AJ002423) hypothetical protein [Neisseria gonorrh] Length » 417 
Score « 1512 (705.6 bits), Expect - 5.3e-203, P - 5.3e-203 
Identities - 301/326 (92%), Positives = 306/326 (93%) 
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Query: 


307 


Sbjct: 


1 


Query: 


367 


Sbjct: 


61 


Query : 


427 


Sbjct: 


121 


Query: 


487 


Sbjct: 


181 



RQSLRLLSDGNDS DIRHLSRLLDNLGSVDQQFRQLRHSDSPAENDRMGDTRIAALETGS 



FKNTWQAIRPQLNLES VFRHAVRL S LWAAACT I VEALNLNLGYWI LLT LFVCQPNYT 



ATKSRVYQRIAGTVLGVIVGSLVPYFTPSVETBCLWIVIAGTTLFFMTRTYKYSFSTFFIT 



IQALTSLS LAGLDVYAAMPVRI I DT I IGAS LAWAAVS YLWPDWKYLTLERTAALAVCSSG 
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Query: 547 TYLQKIAERLKTGETGDDIEYRITRRRAHEHTAALSSTLSDMSSEPAKFADSLQPGFTLL 606 

TYLQKIAERLKTGETGDDIEYRITRRRAHEHTAALSSTLSDMSSEPAKFAD+ P 
Sbjct:: 241 TYLQKIAERLKTGETGDDIEYRITRRRAHEHTAALSSTLSDMSSEPAKFADTCNPALPCS 300 

Query: 607 KTG YALTGY I S ALGAYRSEMHEECS P 632 

K ALTGYISALG ++ + +P 
Sbjct: 301 KPATALTG Y I SALGHTAAKCTKNAAP 326 

Based on this analysis, including the presence of several putative transmembrane domains in the 
gonococcal protein (the first of which is also seen in the meningococcal protein), and on homology 
with the YHFK protein, it is predicted that the proteins from N.meningitidis and K gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 14 

The following DNA sequence, believed to be complete, was identified in Kmeningitidis <SEQ ID 
113>: 

1 ATGAATATGC TGGGAGCTTT GGCAAAAGTC GGCAGCCTGA CGATGGTGTC 

51 GCGCGTTTTG GGATTTGTGC GCGATACGGT CATTGCGCGG GCATTCGGCG 

101 CGGGTATGGC GACGGATGCG TTTTTTGTCG CGTTCAAACT GCCCAACCTG 

151 CTTCGCCGCG TGTTTGCGGA GGGGGCGTTT GCCCAAGCGT TTGTGCCGAT 

201 TTTGGCGGAA TACAAGGAAA CGCGTTCAAA AGAGGCGG.C GAAGCCTTTA 

251 TCCGCCATGT GGCGGGGATG CTGTCGTTTG TACTGGTTAT CGTTACCGCG 

301 CTGGGCATAC TTGCCGCGCC TTGGGTGATT TATGTTTCCG CACCCGAGTT 

351 TTGCCCAAGA TGCCGACAAA TTTCAGCTCT CCATCGATTT GCTGCGGATT 

401 ACGTTTCCTT ATATATTATT GATTTCCCTG TCTTCATTTG TCGGCTCGGT 

451 ACTCAATTCT TATCATAAGT TCGGCATTCC GGCGTTTACG CCAC.GTTTC 

501 TGAACGTGTC GTTTATCGTA TTCGCGCTGT TTTTCGTGCC GTATTTCGAT 

551 CCGCCCGTTA CCGCGCyGGC GTGGGCGGTC TTTGTCGGCG GCATTTTGCA 

601 ACTCGrmTTC GAACTGCCCT GGCTGGCGAA ACTGGGCTTT TTGAAACTGC 

651 CCAAACtGAG TTTCAAAGAT GCGGCGGTCA ACCGCGTGAT GAAACAGATG 

701 GCGCCTGCgA TTTTgGGCGT GAgCGTGGCG CAGGTTTCTT TGGTGATCAA 

751 CACGATTTTc GCGTCf TATC TGCAATCGGG CAGCGTTTCA TGGATGTATT 

801 ACGCCGACCG CATGATGGAG CTGCCCAGCG GCGTGCTGGG GGCGGCACTC 

851 GGTACGATTT TGCTGCCGAC TTTGTCCAAA CACTCGGCAA ACCaAGATAC 

901 GGaACAGTTT TCCGCCCTGC TCGACTGGGG TTTGCGCCTG TGCATGCtgc 

951 TGACGGTGCC GGCGgcGGTC GGACTGGCGG TGTTGTCGTT cCCgCtGGTG 

1001' GCGACGCTGT TTATGTACCG CGwATTTACG CTGTTTGACG CGCAGATGAC 

1051 GCAACACGCG CTGATTGCCT ATTCTTTCGG TTTAATCGGC TTAATCATGA 

1101 TTAAAGTGTT GGCACCCGGC TTCTATGCGC GGCAAAACAT CAAwAmGCCC 

1151 GTCAAAATCG CCATCTTCAC GCTCATCTGC mCGCAGTTGA TGAACCTTGs 

1201 CTTTAyCGGC CCACTrrAAC rCa^TCGGAC TTTCGCTTGC CATCGGTCTG 

1251 GGCGCGTGTA TCAATGCCGG ATTGTTGTTT TACCTGTTGC GCAGACACGG 

1301 TATTTACCAA CCTGG.CAAG GGTTGGGCAG CGTTCTT . AG CAAAAATGCT 

1351 GcTCTCGCTC GCCGTGA 

This corresponds to the amino acid sequence <SEQ ID 114; 0RF2O: 

1 MNMLGALAKV GSLTMVSRVL GFVRDTVIAR AFGAGMATDA FFVAFKLPNL 

51 LRRVFAEGAF AQAFVPILAE YKETRSKEAX EAFIRHVAGM LSFVLVIVTA 

101 LGILAAPWVI YVSAPSFAQD ADKFQLSIDL LRITFPYILL ISLSSFVGSV 

151 LNSYHKFGIP AFTPXFLNVS FIVFALFFVP YFDPPVTAXA WAVFVGGILQ 

201 LXFQLPWLAK LGFLKLPKLS FKDAAVNRVM KQMAPAILGV SVAQVSLVIN 

251 TIFASYLQSG SVSWMYYADR MMELPSGVLG AALGTILLPT LSKHSANQDT 

301 EQFSALLDWG LRLCMLLTLP AAVGLAVLSF PLVATLFMYR XFTLFDAQMT 

351 QHALIAYSFG LIGLIMIKVL APGFYARQNI XXPVKIAIFT LICXQLMNLX 

401 FXGPLXXIGL SLAIGLGACI NAGLLFYLLR RHGIYQPXQG LGSVLXQKCC 

451 SRSP* 

These sequences were elaborated, and the complete DNA sequence <SEQ ID 1 15> is: 

1 ATGAATATGC TGGGAGCTTT GGCAAAAGTC GGCAGCCTGA CGATGGTGTC 
51 GCGCGTTTTG GGATTTGTGC GCGATACGGT CATTGCGCGG GCATTCGGCG 
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101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 



CGGGTATGGC 
CTTCGCCGCG 
TTTGGCGGAA 
TCCGCCATGT 
CTGGGCATAC 
TGCCCAAGAT 
CGTTTCCTTA 
CTCAATTCTT 
GAACGTGTCG 
CGCCCGTTAC 
CTCGGCTTCC 
CAAACTGAGT 
CGCCTGCGAT 
ACGATTTTCG 
CGCCGACCGC 
GTACGATTTT 
GAACAGTTTT 
GACGCTGCCG 
CGACGCTGTT 
CAACACGCGC 
TAAAGTGTTG 
TCAAAATCGC 
TTTATCGGCC 
CGCGTGTATC 
TTTACCAACC 
TCGCTCGCCG 
GTTTGAATGG 
TCCTGATTGC 
GGCTTCCGTC 



GACGGATGCG 
TGTTTGCGGA 
TACAAGGAAA 
GGCGGGGATG 
TTGCCGCGCC 
GCCGACAAAT 
TATATTATTG 
ATCATAAGTT 
TTTATCGTAT 
CGCGCTGGCG 
AACTGCCCTG 
TTCAAAGATG 
TTTGGGCGTG 
CGTCTTATCT 
ATGATGGAGC 
GCTGCCGACT 
CCGCCCTGCT 
GCGGCGGTCG 
TATGTACCGC 
TGATTGCCTA 
GCACCCGGCT 
CATCTTCACG 
CACTGAAACA 
AATGCCGGAT 
TGGCAAGGGT 
TGATGTGCGG 
GCGCACGCCG 
CGTCGGCGGC 
CGCGCCATTT 



TTTTTTGTCG 
GGGGGCGTTT 
CGCGTTCAAA 
CTGTCGTTTG 
TTGGGTGATT 
TTCAGCTCTC 
ATTTCCCTGT 
CGGCATTCCG 
TCGCGCTGTT 
TGGGCGGTCT 
GCTGGCGAAA 
CGGCGGTCAA 
AGCGTGGCGC 
GCAATCGGGC 
TGCCCAGCGG 
TTGTCCAAAC 
CGACTGGGGT 
GACTGGCGGT 
GAATTTACGC 
TTCTTTCGGT 
TCTATGCGCG 
CTCATCTGCA 
CGTCGGACTT 
TGTTGTTTTA 
TGGGCAGCGT 
CGGACTGTGG 
GCGGAATGCG 
GGACTGTATT 
CAAACGCGTG 



CGTTCAAACT 
GCCCAAGCGT 
AGAGGCGGCG 
TACTGGTTAT 
TATGTTTCCG 
CATCGATTTG 
CTTCATTTGT 
GCGTTTACGC 
TTTCGTGCCG 
TTGTCGGCGG 
CTGGGCTTTT 
CCGCGTGATG 
AGGTTTCTTT 
AGCGTTTCAT 
CGTGCTGGGG 
ACTCGGCAAA 
TTGCGCCTGT 
GTTGTCGTTC 
TGTTTGACGC 
TTAATCGGCT 
GCAAAACATC 
CGCAGTTGAT 
TCGCTTGCCA 
CCTGTTGCGC 
TCTTAGCAAA 
GCAGCGCAGG 
GAAAGCGGGG 
TCGCATCACT 
GAAAACTGA 



GCCCAACCTG 
TTGTGCCGAT 
GAGGCTTTTA 
CGTTACCGCG 
CACCCGGTTT 
CTGCGGATTA 
CGGCTCGGTA 
CCACGTTTCT 
TATTTCGATC 
CATTTTGCAA 
TGAAACTGCC 
AAACAGATGG 
GGTGATCAAC 
GGATGTATTA 
GCGGCACTCG 
CCAAGATACG 
GCATGCTGQT 
CCGCTGGTGG 
GCAGATGACG 
TAATCATGAT 
AAAACGCCCG 
GAACCTTGCC 
TCGGTCTGGG 
AGACACGGTA 
AATGCTGCTC 
CTTACCTGCC 
CAGCTCTGCA 
GGCGGCTTTG 



This corresponds to the amino acid sequence <SEQ ID 116; ORF20-1>: 



1 MNMLGALAKV GSLTMVSRVL GFVRDTVIAR AFGAGMATDA FFVAFKLPNL 

51 LRRVFAEGAF AQAFVPILAE YKETRSKEAA EAFIRHVAG M LSFVLVIVTA 

101 LGILAAPWVI YVSAPGFAQD ADKFQLSIDL LRIT FPYILL ISLSSFVGSV 

151 LNSYHKFGIP AFTPT FLNVS FIVFALFFVP YF DPP VTALA WAVFVGGILQ 

201 LGFQLPWLAK LGFLKLPKLS FKDAAVNRVM K QMAPAILGV SVAQVSLVTN 

251 TIFASYLQSG SVSWMYYADR MMELPSGVLG AALGTILLPT LSKHSANQDT 

301 EQFSALLDWG L RLCMLLTLP AAVGLAVLS F PLVATLFMYR EFTLFDAQMT 

351 QH ALIAYSFG LIGLIMIKVL APGFYARQNI KTPV KIAIFT LICTQLMNLA 

401 FIGPLKHVGL S LAIGLGACI NAGLLFYL LR RHGIYQPGKG WA AFLAKMLL 

451 SLAVMCGGLW AAQAYLPFEW AHAGGMRKAG Q LCILIAVGG GLYFASLA AL 

501 GFRPRHFKRV EN* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with the MviN virulence factor of S. tvvhimurium (accession number P37169^ 
ORF20 and MviN proteins show 63% aa identity in 440aa overlap: 

Orf20 1 MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 60 

MN+L +LA V S+TM SRVLGF RD ++AR FGAGMATDAFFVAFKLPNLLRR+FAEGAF 
MviN 14 MNLLKS LAAVS SMTMFSRVLGFARDAI VAR I FGAGMATDAFFVAFKLPNLLRRI FAEGAF 73 

Orf20 61 AQAFVP I LAE YKETRSKEAXEAFIRHVAGMLS FVLVI VTALGILAAPWVI YVSAPS FAQD 120 

+QAFVPILAEYK + +EA F+ +V+G+L+ L +VT G+LAAPWVI V+AP FA 
MviN 74 SQAFVPILAEYKSKQGEEATRIFVAYVSGLLTLAIANAH^ 133 

Orf20 121 ADKFQLS I DLLRITFPYILL I SLSSEVGSVLNSYHKFG I PAFTPXFLNVS FIVFALFFVP 180 

ADKF L+ LLRITFPYILLISL+S VG++LN++++F IPAF P FLN+S I FALF P 
MviN 134 ADKFALTTQLLRITFPYILLI SLASLVGAILNTWNRFSI PAFAPTFLNISMIGFALFAAP 193 

Orf20 181 YFDPPVTAXAWAVFVGGI LQLX FQLPWLAKLG FLKLPKLS FKDAAVNRVMKQMAPAI LGV 240 

YF+PPV A AWAV VGG+LQL +QLP+L K+G L LP+++F+D RV+KQM PAILGV 
MviN 194 YFNP PVLALAWAVTVGGVLQLVYQLP YLKKIGMLVL PRIN FRDTGAMRWKQMGPAI LGV 253 

Orf20 241 SVAQVSLVINTIFASYLQSGSVSWMYYADRMMELPSGVLGAALGTILLPTLSECHSANQDT 300 

SV+Q+SL+INTIFAS+L SGSVSWMYYADR+ME PSGVLG ALGTI LLP+LSK A+ + 
MviN 254 SVSQISLIINTIFASFLASGSVSWMYYADRLMEFPSGVLGVALGTILLPSLSKSFASGNH 313 
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Orf20 301 EQFSALLDWGLRLCMLLTLPAAVGLAVLSFPLVATLFMYRXFTLFDAQMTQHALIAYSFG 360 
+++ L+DWGLRLC LL LP+AV L +L+ PL +LF Y FT FDA MTQ ALIAYS G 

MviN 314 deycri^dwgliox:fllalpsavalgilakpltvslfqygkftafdaamtqraliaysvg 373 

5 Orf20 3 61 LIGLIMIKVLAPGFYARQNIXXPVKIAIFTLICXQLMNLXFXXXXXXXXXXXXXXXXXCI 420 

LIGLI++KVLAPGFY+RQ+I PVKIAI TLI QLMNL F C+ 
MviN 374 LIGLIVVKVLAPGFYSRQDIKT PVKIAI VTLIMTQLMNLAFIGPLKHAGLSLSIGLAACL 433 

Orf20 421 NAGLLFYLLRRHGIYQPXQG 440 
10 NA LL++ LR+ 1+ P G 

MviN 434 NASLLYWQLRKQNI FTPQPG 453 

Homology with a predicted ORF from N. meningitidis fstrain A) 

ORF20 shows 93.5% identity over a 447aa overlap with an ORF (ORF20a) from strain A of N. 
15 meningitidis: 
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orf20.pep 
orf20a 

or f 20 .pep 
orf20a 

orf20 .pep 
orf20a 

or f 20 .pep 
orf20a 

orf20.pep 
orf20a 

or f 20. pep 
orf20a 

orf20.pep 
orf20a 

or £20 .pep 
orf20a 



10 20 30 40 50 60 

MNMLGALAKVG S LTMVS RVLGFVRDTVI ARAFGAGMAT DAFFVAFKLPNLLRRVFAEGAF 

MIMII: IIIMIMilllllllllltl! llllllillililllllll 

MNMLGALVKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 

10 20 30 40 50 60 

7 0 80 90 100 110 120 

AQAFVPILAEYKETRSKEAXEAFIRHVAG MLSFVLVIVTALGILAA PWVIYVSAPSFAOn 

M 1 1 I I I I I I I I I I I I I I I : I I I I I I M M IIMII IMIIII M II | | | : | | : | 

AQAFVPILAEYKETRSKEATEAFIRHVAG MLSFVLVIVTALGILAA PWVlYVSAPGFAKn 

70 80 90 100 110 120 

130 140 150 160 170 180 

ADKFQLSIDLLRIT FPYILLISLSSFVGSVLN SYHKFGIPAFTP XFLNVS FIVFALFFVP 
M M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I | :|IM I I MIIUM I 
ADKFQLSIDLLRIT FPYILLISLSSFVGSVL NSYHKFSIPAFTPT FLNVSFIVFALFFVP 

130 140 150 160 170 180 

190 200 210 220 230 240 

YFDPP VTAXAWAVFVGGILQI^ FOLPWI^Lft 

mm ii ii i iii ii i iii ii ii H mm 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 m 1 1 1 

YFDPP VTAIAWAVEVGGILOLG FOLPWLAKLGFT.yT>pyT..qre 

190 200 210 220 230 240 

250 260 270 280 290 300 

SVAQVSLVI NTIFASYLQSGSVSWMYYADRMMRT J P5;r:vT.r:ziaT.r! TTT j J p TL$KHS7 ^iQ DT 

MIMIMIIMIIIi MIIMIimthllilllMIIIMllll II 

SVAQIS^NTIFA5YLQSGSVSWMYYADRMMELPGGVLGAALGTILLPTLSKHSANQDT 

250 260 270 280 290 300 

310 320 330 340 350 360 

EQFS ALLDWGL RLCMLLTL PAAVGLAVLS FPLVAT LFMYRX FTT .ma^MTny a t . t a y ^ fy^ 

MMMMIIII t I M I I I I I M : I I f i I I t I I I I M I I M I M M I i I I I TTTTTTT 
EQ FS ALL DWG L RXCML LT L PAAVGMAVL S FPLVAT L FMYRR FT T . Pn AQMTftH aiTavc 

310 320 330 340 350 360 

370 380 390 400 410 420 

LIGLIMIKVIA PGFYARQNIXXPV KIAIFTLICXQI^NLXFX GPLXXIGLSLATr,T/:nrT 
MMIMIMIIIIIIMII : I II I III || ||:lllll I III : I | I I I I I I I I I I 
LIGLIMIKVLAPGFYAilQNIKT PVK IAI FTLICTQLMNIAFI GPLKHVGLSLATf:T/:ArT 

370 380 390 400 410 420 

430 440 450 

NAGLLFYLLRRHGI YQPXQGLGSVLXQKCCSRS PX 
I I I I I I I I I I I I I I I I I :(::[: 

NAGLLFYLLRRHGIYQPGKGWA AFLAKMLLSIAVMGGGL YAAQIWLPFDWAHAf^MQyAn 
430 440 450 460 470 480 



65 



The complete length ORF20a nucleotide sequence <SEQ ID 1 17> is: 

1 ATGAATATGC TGGGAGCTTT GGTAAAAGTC GGCAGCCTGA CGATGGTGTC 
51 GCGCGTTTTG GGATTTGTGC GCGATACGGT CATTGCGCGC GCATTCGGCG 
101 CAGGCATGGC GACGGATGCG TTCTTTGTCG CGTTCAAACT GCCCAACCTG 
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151 CTTCGCCGCG TGTTTGCGGA GGGGGCGTTT GCCCAAGCGT TTGTGCCGAT 

201 TTTGGCGGAA TATAAGGAAA CGCGTTCTAA AGAGGCGACG GAGGCTTTTA 

251 TCCGCCATGT GGCGGGGATG CTGTCGTTTG TACTGGTCAT CGTTACCGCG 

301 CTGGGCATAC TTGCCGCGCC TTGGGTGATT TATGTTTCCG CACCCGGTTT 

351 TGCCAAAGAT GCCGACAAAT TTCAGCTCTC TATCGATTTG CTGCGGATTA 

4 01 CGTTTCCTTA TATCTTATTG ATTTCACTTT CCTCTTTTGT CGGCTCGGTA 

451 CTCAATTCCT ATCATAAATT CAGCATTCCT GCGTTTACGC CCACGTTCCT 

501 GAACGTGTCG TTTATCGTAT TCGCGCTGTT TTTCGTGCCG TATTTCGATC 

551 CTCCCGTTAC CGCGCTGGCT TGGGCGGTTT TTGTCGGCGG CATTTTGCAA 

601 CTCGGCTTCC AACTGCCCTG GCTGGCGAAA CTGGGTTTTT TGAAACTGCC 

651 CAAACTGAGT TTCAAAGATG CGGCGGTCAA CCGCGTGATG AAACAGATGG 

701 CGCCTGCGAT TTTGGGCGTG AGCGTGGCGC AGATTTCTTT GGTGATCAAC 

751 ACGATTTTCG CGTCTTATCT GCAATCGGGC AGCGTTTCAT GGATGTATTA 

801 CGCCGACCGC ATGATGGAAC TGCCCGGCGG CGTGCTGGGG GCGGCACTCG 

851 GTACGATTTT GCTGCCGACT TTGTCCAAAC ACTCGGCAAA CCAAGATACG 

901 GAACAGTTTT CCGCCCTGCT CGACTGGGGT TTGCGCNTGT GCATGCTGCT 

951 GACGCTGCCG GCGGCGGTCG GAATGGCGGT GTTGTCGTTC CCGCTGGTGG 

1001 CAACCTTGTT TATGTACCGA GAATTCACGC TGTTTGACGC GCAGATGACG 

1051 CAACACGCGC TGATTGCCTA TTCTTTCGGT TTAATCGGTT TAATCATGAT 

1101 TAAAGTGTTG GCGCCCGGCT TTTATGCGCG GCAAAACATC AAAACGCCCG 

1151 TCAAAATCGC CATCTTCACG CTCATTTGCA CGCAGTTGAT GAACCTTGCC 

1201 TTTATCGGCC CACTGAAACA CGTCGGACTT TCGCTTGCCA TCGGTCTGGG 

1251 CGCGTGTATC AATGCCGGAT TGTTGTTTTA CCTGTTGCGC AGACACGGTA 

1301 TTTACCAACC TGGCAAGGGT TGGGCAGCGT TCTTGGCAAA AATGCTGCTC 

1351 TCGCTCGCCG TGATGGGAGG CGGCCTGTAT GCCGCCCAAA TCTGGCTGCC 

1401 GTTCGACTGG GCACACGCCG GCGGAATGCA AAAGGCCGCC CGGCTCTTCA 

1451 TCCTGATTGC CGTCGGCGGC GGACTGTATT TCGCATCACT GGCGGCTTTG 

1501 GGCTTCCGTC CGCGCCATTT CAAACGCGTG GAAAGCTGA 

This encodes a protein having amino acid sequence <SEQ ID 1 18>: 

1 MNMLGALVKV GSLTMVSRVL GFVRDTVIAR AFGAGMATDA FFVAFKLPNL 

51 LRRVFAEGAF AQAFVPILAE YKETRSKEAT EAFIRHVAG M LSFVLVIVTA 

101 LGILAA PWVI YVSAPGFAKD ADKFQLSIDL LRIT FPYILL ISLSSFVGSV 

151 LNSYHKFSIP AFTPT FLNVS FIVFALFFVP YF DPP VTALA WAVFVGGILQ 

201 LGFQLPWLAK LGFLKLPKLS FKDAAVNRVM KQ MAPAILGV SVAQISLVIN 

251 TIFASYLQSG SVSWMYYADR MMELPGGVLG AALGTILLPT LSKHSANQDT 

301 EQFSALLDWG LR XCMLLTLP AAVGMAVLS F PLVATLFMYR EFTLFDAQMT 

351 QH ALIAYSFG LIGLIMIKVL APGFYARQNI KTPV KIAIFT LICTQLMNLA 

401 FIGPLKHVGL S LAIGLGACI NAGLLFYL LR RHGIYQPGKG W AAFLAKMLL 

451 SLAVMGGGLY AAQIWLPFDW AHAGGMQKAA R LFILIAVGG GLYFASLAA L 

501 GFRPRHFKRV ES* 

ORF20a and ORF20-1 show 96.5% identity in 512 aa overlap: 

10 20 30 40 50 60 

MNMLGALVKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 
Itillf l:MI II 11 IN II I I I I! II I II I II II I III! Ill M I I II M [| M I! Ml 
MNMLGALAKVGSLTMVSRVLG FVRDTV I ARAFGAGMAT DAFFVAFKL PNLLRRVFAEGAF 
10 20 30 40 50 60 

70 80 90 100 110 120 

AQAFVPILAE YKETRSKEATEAFIRHVAGMLSFVLVIVTALGILAAPWVIYVSAPGFAKD 

H I M I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I t I I I I I I I I I I I I I I I I : I 

AQAFVPILAE YKETRSKEAAEAFIRHVAGMLSFVLVIVTALGILAAPWVIYVSAPGFAQD 
70 80 90 100 110 120 

130 140 150 160 170 180 

ADKFQLSIDLLRITFPYILLISLSSFVGSVLNSYHKFSIPAFTPTFLNVSFIVFALFFVP 
I III IN IMIIM II I IIMMIII Ml M II I 111:11 INI II I II IIMIMM II 
ADKFQLS I DLLRITFPYILLISLSSFVGSVLNSYHKFG I PAFTPTFLNVS FIVFALFFVP 
130 140 150 160 170 180 

190 200 210 220 230 240 

YFDPPVTALAWAVFVGG I LQLGFQLPWLAKLGFLKLPKLS FKDAAVNRVMKQMAPAI LGV 
I I I I II I I I I I II I I I I I I I II I I li I I I I I I I I I I I I I I I I I I I I | | | | [ | | | || | | | | 
YFDPPVTALAWAVFVGG ILQLGFQLPWLAKLGFLKLPKLS FKDAAVNRVMKQMAPAI LGV 
190 200 210 220 230 240 

250 260 270 280 290 300 

SVAQISLVINTIFASYLQSGSVSWMYYADRMMELPGGVLGAALGTILLPTLSKHSANQDT 



or f 20a. pep 
orf20-l 

orf20a.pep 
orf20-l 

orf20a.pep 
orf20-l 

orf20a.pep 
orf20-l 

orf20a.pep 
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I M I : I I I I II i I I I I I I I I I I I I I I I I I II I I I I : I I I I I I I I I I I II I I I I I I I I I I I 
orf 20-1 SVAQVSLVINTIFASYLQSGSVSWMYYADRMMELPSGVLGAALGTILLPTLSKHSANQDT 

250 260 270 280 290 300 

310 320 330 340 350 . 360 

or f 20a . pep EQFSALLDWGLRXCMLLT L PAAVGMAVLS FPLVATL FMYRE FTL FDAQMTQHALI AYS FG 

I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I | I 
Orf20-1 EQFSALLDWGLRLCMLLTLPAAVGLAVLSFPLVATLFMYREFTLFDAQMTQHALIAYSFG 

310 320 330 340 350 360 

370 380 390 400 410 420 

or f 20a . pep LIGLIMIKVLAPGFYARQNIKTPVKIAIFTLICTQLMNIAFIGPLKHVGLSLAIGLGACI 
Mill M II M II I I I I I I I I I I M I I I I I I I I i II U I I 1 I M I I I I II I I M I I I I I I 
orf20-l LIGLIMIKVLAPGFYARQNIKTPVKIAIFTLICTQLMNLAFIGPLKHVGLSLAIGLGACI 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf 20a . pep NAGLLFYLLRRHGIYQPGKGWAAFLAKMLLSIJWMGGGLYAAQIWLPFDWAHAGGMQKAA 
IIMMII III INI II 111 II II I II I I Mil II 111:111 :| 11:1 I MM MM: 
orf 20-1 NAGLL FYLLRRHG I YQPGKGWAAFLAKMLLS LAVMCGGLWAAQAYLPFEWAHAGGMRKAG 

430 440 450 460 470 480 

490 500 510 

orf 20a . pep RLFILIAVGGGLYFASLAALGFRPRHFKRVESX 
: I II I I Mil MM II II I M I II M M II : I 
orf 20-1 QLCILIAVGGGLYFASLAALGFRPRHFKRVENX 

490 500 510 

Homology with a predicted ORF from TV Gonorrhoeae 

ORF20 shows 92.1% identity over a 454aa overlap with a predicted ORF (ORF20ng) from N. 
gonorrhoeae: 



35 



40 



orf 20. pep 
orf20ng 
orf 20. pep 
orf20ng 
orf 20. pep 
orf20ng 



MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 
I M M II II II I II II I M M II I II M M I li II II I I II II II M M I II II M II I I 
MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 



60 



60 



120 



AQAFVPILAEYKETRSKEAXEAFIRHVAGMLSFVLVIVTALGILAAPWVIYVSAPSFAQD 

I I I M II II I I Ml I II I I : I II I II I II I I M M :: I I II II II M I i II II II : I :: I 
AQAFVPILAEYKETRSKEATEAFIRHVAGMLSFVLIWTALGILAAPWVIYVSAPGFTKD 120 

ADKFQLS IDLLRITFPYI LLI SLSS FVGSVLNS YHKFG I PAFTPXFLNVS FI VFALFFVP 180 

II Ml II Ml MMMI II III MM II 1:1 MM II III I II 1:1 11:11111 MUM 
ADKFQLSISLLRITFPYILLISLSSFVGSILNSYHKFGIPAFTPTFLNISFIVFALFFVP 180 



45 



50 



55 



60 



orf 20. pep YFDPPVTAXAWAV FVGG I LQLXFQLPWLAKLG FLKLPKLS FKDAAVNRVMKQMAPAILGV 240 

IIMMII I I I 1 t I i I I t I t I I II M II I I I I II I II : I I I I I i II II II I II II M I 
or f 2 Ong YFDPPVTALAWAVFVGGI LQLGFQLPWLAKLGFLKLPKLN FKDAAVNRVMKQMAPAILGV 240 

orf 20 .pep SVAQVSLVINTIFASYLQSGSVSWMYYADRMMELPSGVLGAALGTILLPTLSKHSANQDT 300 

MM: II MMMI III Mill II MMMI Ml MM III II II Ml MM ill Mill 
orf20ng SVAQI SLVINTI FAS YLQSGSVSWMYYADRMMELPGGVLGAALGT I LLPTLSKHSANQDT 300 

orf 20 . pep EQFSALL DWGLRLCMLLT LPAAVGLAVLS FPLVATLFM YRXFTLFDAQMTQHAL I AYS FG 360 

M M I II M II I II I I I M M I : I M II I II II II II II I M M I II II II I I II II II 
orf20ng EQFSALLDWGLRLCMLLTLPAAAGLAVLSFPLVATLFMYRE FTL FDAQMTQHALI AYS FG 360 

orf 20 .pep LIGLIMIKVLAPGFYARQNIXXPVKIAIFTLICXQLMNLXFXGPLXXIGLSLAIGLGACI 420 

M M I M M M MMMM : II II II II I II : I II II i III I II I M II I II I 
orf20ng LIGLIMIKVLASGFYARQNIKTPVKIAIFTLICTQLMNLAFIGPLKHAGLSLAIGLGACI 420 

orf 20. pep NAGLLFYLLRRHG I YQPXQGLG S VLXQKCCSRS P 454 

II II I I : I : I : I II I : I MM: MMMM 
orf20ng NAGLLFFLFRKHGIYRPGQGLGQPSWRKCCSRSP 454 

An ORF20ng nucleotide sequence <SEQ ID 1 19> was predicted to encode a protein having amino 



acid sequence <SEQ ID 120>: 
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1 MNMLGALAKV GSLTMVSRVL G FVRDTV I AR AFGAGMATDA FFVAFKLPNL 

51 LRRVFAEGAF AQAFVPILAE YKETRSKEAT EAFIRHVAGM LSFVLIWTA 

101 LGILAAPWVI YVSAPGFTKD ADKFQLSISL LRITFPYILL ISLSSFVGSI 

151 LNSYHKFGIP AFTPTFLNIS FIVFALFFVP YFDPPVTALA WAVFVGGILQ 

201 LGFQLPWLAK LGFLKLPKLN FKDAAVNRVM KQMAPAILGV SVAQISLVIN 

251 TIFASYLQSG SVSWMYYADR MMELPGGVLG AALGTILLPT LSKHSANQDT 

301 EQFSALLDWG LRLCMLLTLP AAAGLAVLSF PLVATLFMYR EFTLFDAQMT 

351 QHALIAYSFG LIGLIMIKVL ASGFYARQNI KTPVKIAIFT LICTQLMNLA 

401 FIGPLKHAGL SLAIGLGACI NAGLLFFLFR KHGIYRPGQG LGQPSWRKCC 

451 SRSP* 

Further DNA sequence analysis revealed the following DNA sequence <SEQ ID 121>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 



ATGAATATGC 
GCGCGTTTTG 
CGGGTATGGC 
CTTCGCCGCG 
TTTGGCGGAA 
TCCGCCACGt 
CTGGGCATAC 
TACCAAAGAC 
CGTTTCCTTA 
CTCAATTCCT 
AAACATCTCT 
CGCCCGTTAC 
CTCGGTTTCC 
CAAACTGAAT 
CGCCTGCGAT 
ACGATTTTCG 
cgCCGACCGC 
GTACAATTTT 
GAACAGTTTT 
GACGCTGCCG 
CGACGCTGTT 
CAACACGCGC 
TAAAGTGTTG 
TCAAAATCGC 
TTTATCGGTC 
CGCGTGCATC 
TTTACCGGCC 
GCGCTCGCCG 
GTTCGAATGG 
TCCTGATTGC 
GGCTTCCGTC 



TTGGAGCTTT 
GGATTTGTGC 
GACGGATGCG 
TGTTTGCGGA 
TATAAGGAAA 
tgcgggAatg 
TTGCCGCgcc 
GCGGACAAGT 
TATATTATTG 
ACCATAAGTT 
TTTATCGTAT 
CGCGCTGGCG 
AACTGCCGTG 
TTCAAAGATG 
TTTGGGCGTG 
CGTCTTATCT 
ATGATGGAGc 
GCTGCCGACT 
CCGCCCTGCT 
GCGGCGGccg 
TATGTACCGA 
TGATTGCCTA 
GCATCCGGCT 
CATCTTCACG 
CGTTGAAACA 
AACGCCGGAT 
cggcaggggt 
TGATGTGCGG 
GCGCACGCCG 
CGTCGGCGGC 
CGCGCCATTT 



GGCAAAAGTC 
GCGATACQGT 
TTTTTTGTCG 
GGGGGCGTTT 
CGCGTTCTAA 
CTGTCGTTTG 
tTGGGTGATT 
TCCAACTTTC 
ATTTCTTTGT 
CGGCATTCCC 
TCGCACTGTT 
TGGGCGGTTT 
GCTGGCGAAA 
CGGCGGTCAA 
agcgTGGCGC 
GCAATCGGGC 
tgcgccGGGG 
TTGTCCAAAC 
CGACTGGGGT 
GACTGGCGGT 
GAATTCACGC 
TTCTTTCGGT 
TTTATGCGCG 
CTCATCTGCA 
CGCCGGGCTT 
TGTTGTTCTT 
tgggcggcgt 
CGGACTGTGG 
GCGGAATGCG 
GGACTGTATT 
CAAACGCGTG 



GGCAGCCTGA 
CATTGCGCGG 
CGTTCAAACT 
GCCCAAGCGT 
AGAGGCGAcg 
TGCTGATcgt 
TATGTTtccg 
CATCAGCCTG 
CTTCTTTTGT 
GCGTTTACGC 
TTTCGTGCCG 
TTGTCGGCGG 
CTGGGCTTTT 
CCGCGTCATG 
AAATTTCTTT 
AGCGTTTCAT 
CGTGCTGGGG 
ACTCGGCAAA 
TTGCGCCTGT 
ATTGTCGTTC 
TGTTTGACGC 
TTAATCGGTT 
GCAAAACATC 
CGCAGTTGAT 
TCGCTCGCCA 
CCTGTTGCGC 
TCTTGGCGAA 
GCGGCGCAGG 
GAAAGCGGGG 
TCGCATCTCT 
GAAAGCTGA 



CGATGGTGTC 
GCATTCGGCG 
GCCCAACCTG 
TTGTGCCGAT 
gAGGCTTTTA 
cGttacCGCG 
CgcccGGCTT 
CTGCGGATTA 
CGGCTCGATA 
CCACGTTTTT 
TATTTCGATC 
TATTTTGCAG 
TGAAACTGCC 
AAACAGATGG 
GgttATCAAC 
GGATGTatta 
GCTGCACTCG 
CCAAGATACG 
GCATGCTGCT 
CCGCTGGTGG 
ACAAATGACG 
TAATTATGAT 
AAAACGCCCG 
GAACCTCGCC 
TCGGCCTGGG 
AAACACGGTA 
AATGCTGCTC 
CTTGCCTGCC 
CAGCTCTGCA 
GGCGGCTTTG 



This encodes the following amino acid sequence <SEQ ID 122; ORF20ng-l>: 



1 MNMLGALAKV GSLTMVSRVL GFVRDTVIAR AFGAGMATDA FFVAFKLPNL 

51 LRRVFAEGAF AQAFVPILAE YKETRSKEAT EAFIRHVAG M LSFVLIWTA 

101 LGILAA PWVI YVSAPGFTKD ADKFQLSISL LRIT FPYILL ISLSSFVGSI 

151 LNSYHKFGIP AFTPT FLNIS FIVFALFFVP YF DPP VTALA WAVFVGGILQ 

201 LGFQLPWLAK LGFLKLPKLN FKDAAVNRVM K QMAPAILGV SVAQISLVIN 

251 TIFASYLQSG SVSWMYYADR MMELRRGVLG AALGTILLPT LSKHSANQDT 

301 EQFSALLDWG L RLCMLLTLP AAAGLAVLS F PLVATLFMYR EFTLFDAQMT 

351 QH ALIAYSFG LIGLIMIKVL ASGFYARQNI KTPV KIAIFT LICTQLMNLA 

401 FIGPLKHAGL S LAIGLGACI NAGLLFFL LR KHGIYRPGRG W AAFLAKMLL 

451 ALAVMCGGLW AAQACLPFEW AHAGGMRKAG Q LCILIAVGG GLYFASLAA L 

501 GFRPRHFKRV ES* 

ORF20ng-l and ORF20-1 show 95.7% identity in 512 aa overlap: 



10 20 30 40 50 60 

or f 2 0- 1 . pep MNMLGALAKVGSLTMVS RVLG FVRDTV I ARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 
1 1 I I 1 1 1 1 III II I II I M II I III 1 1 I II I I If II I II III I I I M t I I M II II I 1 1 I 
orf20ng-l MNMLGALAKVGSLTMVS RVLG FVRDTV I ARAFGAGMAT DAFFVAFKLPNLLRRVFAEGAF 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 20-1 .pep AQAFVPILAEYKETRSKEAAEAFIRHVAGMLSFVLVIVTALGILAAPWVIYVSAPGFAQD 
III I IIMIIIII I 111 11:1 IIIIIMIII III l::IMM I Ml II I I IIIM |1::| 
orf20ng-l AQAFVPILAE YKETRSKEATEAFIRHVAGMLSFVLIWTALGILAAPWVI YVSAPGFTKD 
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130 140 150 160 170 180 

orf 20-1 . pep ADKFQLSIDLLRITFPYILLISLSSFVGSVLNSYHKFGIPAFTPTFLNVSFIVFALFFVP 
I I I I I ! II : I I I I I i I I I I I I I I I I I I I I : I I I I I I I I I I 1 I I I I I i I : M I I I I I I I I I 
orf20ng-l ADKFQLSISLLRITFPYILLISLSSFVGSILNSYHKFGIPAFTPTFLNISFIVFALFFVP 

130 140 150 160 ' 170 180 
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190 200 210 220 230 240 

orf 20-1 . pep Y FDPPVTALAWAVFVGG I LQLG FQLPWLAKLG FLKLPKLS FKDAAVNRVMKQMAPAI LGV 
I I I I M I I I I I I I I I I I I I I II I I ft I I I I I t I I I I I I I : I I II I I I I I I I I I I I I M M 
orf20ng-l YFD P PVT ALAWAVFVGG I LQLG FQL PW LAKLG FLKL PKLN FKDAAVNRVMKQMAPAI LGV 

190 200 210 220 230 240 

250 260 270 280 290 300 

or f 20-1 . pep SVAQVSLVINTIFASYLQSGSVSWMYYADRMMELPSGVLGAALGTILLPTLSKHSANQDT 
I I I I : I I I I I I I I I I I I I I I I 11 I I I I I I I II I I I I I I I I I II I I I II I I | | | | | | | | 
orf20ng-l S VAQ I S LV I NT I FAS Y LQSGS VSWMYY ADRMMELRRGVLGAALGT I LLPTLSKHSANQDT 

250 260 270 280 290 300 

310 . 320 330 340 350 360 

orf 20-1 . pep EQFSALLDWGLRLCMLLTLPAAVGLAVLSFPLVATLFMYREFTLFDAQMTQHALIAYSFG 
I I I 1 I I I I I I I II I I I I I I II I : I I I I I I I I I I I I I I I I I I I I I II I I I I I I II I | | I | | 
orf20ng-l EQFSALLDWGLRLCMLLTLPAAAGLAVLSFPLVATLFMYREFTLFDAQMTQHALIAYSFG 

310 320 330 340 350 360 

370 380 390 400 410 420 

or f 20-1 . pep LIGLIMIKVIAPGFYARQNIKTPVKIAIFTLICTQLMNLAFIGPLKHVGLSLAIGLGACI 
I I I I I I I I I I I II INI lillllll II II Ml I II I II I II Mil 1:111111111111 
orf20ng-l LIGLIMIKVLASGFYARQNIKTPVKIAIFTLICTQLMNLAFIGPLKHAGLSLAIGLGACI 

370 380 390 400 410 420 



35 



430 440 450 460 470 480 

orf 20-1 . pep N AGLL FYLLRRHG I YQPGKGWAAFLAKMLL SLAVMCGGLWAAQAYLPFEWAHAGGMRKAG 
I M I I I : I i I : I I I I : I I : I I I I I I II I I I : I I I I I I I M I I I I I I I I I I I I I I I I I I I 
or f 2 Ong- 1 N AGLL FFLLRKHG I YRPGRGWAAFLAKMLLALAVMCGGLWAAQAC L P FEWAHAGGMRKAG 

430 440 450 460 470 480 



40 



45 



50 



55 



60 



65 



70 



490 500 510 

orf 20-1 - pep QLCILIAVGGGLYFASLAALGFRPRHFKRVENX 
1 1 I I M 1 1 I I I I I II I I I I I I I I I I I I I I I I : I 
orf20ng-l QLCILIAVGGGLYFASLAALGFRPRHFKRVESX 

490 500 510 

In addition, ORF20ng-l shows significant homology with a virulence factor of S.typhimurium: 

sp|P37169|MVIN_SALTY VIRULENCE FACTOR MVIN pir||S40271 mviN protein - Salmonella 
typhimurium gi| 438252 (Z26133) mviB gene product [Salmonella typhimurium] 
gnl|PID|dl005521 (D25292) ORF2 [Salmonella typhimurium] Length - 524 

Score = 1573 (750.1 bits), Expect = l.le-220, Sum P(2) = l.le-220 

Identities = 309/467 (66%), Positives = 368/467 (78%) 

MNMLGAIJuWGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 60 
MN+L +LA V S+TM SRVLGF RD ++AR FGAGMATDAFFVAFKLPNLLRR+FAEGAF 



+QAFVPILAEYK + +EAT F+ +V+G+L+ L WT G+LAAPWVI V+APGF 



ADKF L+ LLRITFPYILLISL+S VG+ILN++++F IPAF PTFLNIS I FALF P 



Query : 


1 


Sbjct: 


14 


Query: 


61 


Sbjct: 


?4 


Query: 


121 


Sbjct: 


134 


Query: 


181 


Sbjct: 


194 


Query: 


241 


Sbjct: 


254 



YF+PPV ALAWAV VGG+LQL +QLP+L K+G L LP++NF+D RV+KQM PAILGV 
YFN P PVXALAWAVTVGGVLQLVYQLP YLKK IGMLVLPRIN FRDTGAMRVVKQMGPAILG V 



253 



SV+QISL+INTIFAS+L SGSVSWMYYADR+ME GVLG ALGTILLP+LSK A+ + 
SVSQISLIINTIFASFLASGSVSWMYYADRLMEFPSGVLGVALGTILLPSLSKSFASGNH 313 
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Query: 


301 


Sbj ct: 


314 


Query: 


361 


Sbjct: 


374 


Query: 


421 


Sbjct: 


434 


Score 


- 70 



+++ L+DWGLRLC LL LP+A L +L+ PL +LF Y +FT FDA MTQ ALIAYS G 
DEYCRLMDWGLRLCFLLALPSAVALGILAKPLTVSLFQYGKFTAFDAAMTQRALIAYSVG 373 

LIGLIMIKVLASGFYARQNIKTPVKIAIFTLICTQLMNLAFIGPLKHAGLSLAIGLGACI 420 
LIGLI++KVLA GFY+RQ+IKTPVKIAI TLI TQLMNLAFIGPLKHAGLSL+IGL AC+ 
LIGLIWBCVLAPGFYSRQDIKTPVKIAIVTLIMTQLMNLAFIGPLKHAGLSLSIGLAACL 433 

NAGLLFFLLRKHG I YRPGRGWXXXXXXXXXXXXVMCGGLWAAQACLP 467 
NA LL++ LRK 1+ P GW VM L+ +P 

NASLLYWQLRKQN I FTPQPGWMWFLMRLI I SVLVMAAVLFGVLHIMP 480 

(33.4 bits), Expect = l.le-220, Sum P(2) = l.le-220 
Identities - 14/41 (34%), Positives = 23/41 (56%) 

Query: 469 EWAHAGGMRKAGQLCI LIAVGGGLYFASLAALGFRPRHFKR 509 

EW+ + + +L ++ G YFA+LA LGF+ + F R 
Sbjct: 481 EWSQGSMLWRLLRLMAWIAGIAAYFAALAVLGFKVKEFVR 521 

Based on this analysis, including the homology with a virulence factor from S.typhimurium, it is 
predicted that these proteins from N. meningitidis and N.gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

i 

Example 15 



The following partial DNA sequence was identified in N.meningitidis <SEQ ED 123>: 

1 atGATTAAAA TCAAAAAAGG TCTAAACCTG CCCATCGCGG GCAGACCGGA 

51 GCAAGCCGTT tACGACGGCC CGGCCaTTAC CGAAGtCGCG TTGCTTGGCG 

101 AAGAATATGC CGGTATGCGC CCCTCGATGA AAGTCAAGGA AGGCGATGCC 

151 GTcAAAAAAG GCCAAGTGCT GTTTGAAGAC AAAAAGAATC CGGGCGTGGT 

201 GTTTACTGCG CCGGCTTCAG GcAAAATCGC CGCGATTCAC CGTGGCGAAA 

251 AGCGCGTACT TCAGTCAGTC GTGATTGCCG TTGAArGCAA CGACGAAATC 

301 GAGTTTGAAC GCTACGCACC TGAAGCGCTG GCAAACTTAA GCGGCGAAGA 

351 AGTGCGCCGC AACCTGATCC AATCCGGTTT GTGGACTGCG CTGCGCACCC 

401 GTCCGTTCAG CAAAATTCCT GCCGTCGATG CCGAGCCGTT CGCCATCTTC 

451 GTCAATGCGA tGGACACCAA TCCG. . 

This corresponds to the amino acid sequence <SEQ ID 124; ORF22>: 



1 MIKIKKGLNL PIAGRPEQAV YDGPAITEVA LLGEEYAGMR PSMKVKEGDA 
51 VKKGQVLFED KKNPGWFTA PASGKIAAIH RGEKRVLQSV VIAVEXNDEI 
101 EFERYAPEAL ANLSGEEVRR NLIQSGLWTA LRTRPFSKIP AVDAEPFAIF 
151 VNAMDTNP . . 

Further work revealed the complete nucleotide sequence <SEQ ID 125>: 



1 ATGATTAAAA TCAAAAAAGG TCTAAACCTG CCCATCGCGG GCAGACCGGA 

51 GCAAGCCGTT TACGACGGCC CGGCCATTAC CGAAGTCGCG TTGCTTGGCG 

101 AAGAATATGC CGGTATGCGC CCCTCGATGA AAGTCAAGGA AGGCGATGCC 

151 GTCAAAAAAG GCCAAGTGCT GTTTGAAGAC AAAAAGAATC CGGGCGTGGT 

201 GTTTACTGCG CCGGCTTCAG GCAAAATCGC CGCGATTCAC CGTGGCGAAA 

251 AGCGCGTACT TCAGTCAGTC GTGATTGCCG TTGAAGGCAA CGACGAAATC 

301 GAGTTTGAAC GCTACGCACC TGAAGCGCTG GCAAACTTAA GCGGCGAAGA 

351 AGTGCGCCGC AACCTGATCC AATCCGGTTT GTGGACTGCG CTGCGCACCC 

401 GTCCGTTCAG CAAAATTCCT GCCGTCGATG CCGAGCCGTT CGCCATCTTC 

451 GTCAATGCGA TGGACACCAA TCCGCTGGCT GCCGACCCTA CGGTCATTAT 

501 CAAAGAAGCC GCCGAGGATT TCAAACGCGG CCTGTTGGTA TTGAGCCGTT 

551 TGACCGAACG CAAAATCCAT GTTTGTAAGG CAGCTGGCGC AGACGTGCCG 

601 TCTGAAAATG CTGCCAACAT CGAAACACAT GAATTCGGCG GCCCGCATCC 

651 TGCCGGTTTG AGTGGCACGC ACATTCATTT CATCGAGCCG GTCGGCGCGA 

701 ATAAAACCGT GTGGACCATC AATTATCAAG ATGTAATTAC CATTGGCCGT 

751 TTGTTTGCAA CAGGCCGTCT GAACACCGAG CGCGTGATTG CCCTAGGTGG 

801 TTCTCAAGTC AACAAACCGC GCCTCTTGCG TACCGTTTTG GGTGCGAAAG 

851 TATCGCAAAT TACTGCGGGC GAATTGGTTG ACACAGACAA CCGCGTGATT 

901 TCCGGTTCGG TATTGAACGG CGCGATTACA CAAGGCGCGC ACGATTATTT 
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951 GGGACGCTAC CACAATCAGA TTTCCGTTAT CGAAGAAGGC CGCAGCAAAG 

1001 AGCTGTTCGG CTGGGTTGCG CCGCAGCCGG ACAAATACTC CATCACGCGT 

1051 ACAACCCTCG GCCATTTCCT GAAAAACAAA CTCTTCAAGT TCAACACAGC 

1101 CGTCAACGGC GGCGACCGCG CCATGGTGCC GATTGGTACT TACGAGCGCG 

1151 TGATGCCCTT GGATATCCTG CCCACCCTGC TTTTGCGCGA TTTAATCGTC 

1201 GGCGATACCG ACAGCGCGCA GGCATTGGGT TGCTTGGAAT TGGACGAAGA 

1251 AGACCTCGCT TTGTGCAGCT TCGTCTGCCC GGGCAAATAC GAATACGGCC 

1301 CGCTGTTGCG CAAAGTGCTG GAAACCATTG AGAAGGAAGG CTGA 

This corresponds to the amino acid sequence <SEQ ID 126; ORF22-l>: 

1 MIKIKKGLNL PIAGRPEQAV YDGPAITEVA LLGEEYAGMR PSMKVKEGDA 

51 VKKGQVLFED KKNPGWFTA PASGKIAAIH RGEKRVLQSV VIAVEGNDEI 

101 EFERYAPEAL ANLSGEEVRR NLIQSGLWTA LRTRPFSKIP AVDAEPFAIF 

151 VNAMDTNPLA AD PT VI IKEA AEDFKRGLLV LSRLTERKIH VCKAAGADVP 

201 SENAANIETH EFGGPHPAGL SGTHIHFIEP VGANKTVWTI NYQDVITIGR 

251 LFATGRLNTE RVIALGGSQV NKPRLLRTVL GAKVSQITAG ELVDTDNRVI 

301 SGSVLNGAIT QGAHDYLGRY HNQISVIEEG RSKELFGWVA PQPDKYSITR 

351 TTLGHFLKNK LFKFNTAVNG GDRAMVPIGT YERVMPLDIL PTLLLRDLIV 

401 GDTDSAQALG CLELDEEDLA LCSFVCPGKY EYGPLLRKVL ETIEKEG* 

Further work identified the corresponding gene in strain A oiKmeningitidis <SEQ ID 127>: 

1 ATGATTAAAA TCAAAAAAGG TCTAAACCTG CCCATCGCGG GCAGACCGGA 

51 GCAAGTCATT TATGACGGGC CCGTCATTAC CGAAGTCGCG TTGCTTGGCG 

101 AAGAATATGC CGGTATGCGC CCCTNGATGA AAGTCAAGGA AGGCGATGCC 

151 GTCAAAAAAG GCCAAGTGCT GTTTGAAGAC AAAAAGNATC CGGGCGTGGT 

201 GTTTACCGCG CCNGTTTCAG GCAAAATCGC CGCCATCCAT CGCGGCGAAA 

251 AGCGCGTACT TCAGTCGGTC GTGATTGCCG TTGAAGGCAA CGACGAAATC 

301 GAGTTCGAAC GCTACGCGCC CGAAGCGTTG GCAAACTTAA GCGGCGANGA 

351 ANTNNGNNGC AATCTGATCC AATCCGGTTT GTGGACTGCG CTGCGTANCC 

401 GTCCGTTCAG CAAAATCCCT GCCGTCGATG CCGAGCCGTT CGCCATCTTC 

451 GTCAATGCGA TGGACACCAA TCCGCTNGCG GCAGACCCTG TGGTTGTGAT 

501 CAAAGAAGCC GNCGANGATT TCAGACGANG TNTGCTGGTA TTGAGCCGTT 

551 TGACCGAGCG TAAAATCCAT GTGTGTAAGG CAGCTGGCGC AGACGTGCCG 

601 TCTGAAAATG CTGCCAACAT CGAAACACAT GAATTCGGCG GCCCGCATCC 

651 GGCCGGTTTG AGTGGCACGC ACATTCATTT CATTGAGCCG GTCGGTGCAA 

701 ACAAAACCGT TTGGACCATC AATTATCAAG ATGTAATTGC CATCGGACGT 

751 TTGTTTGCAA CAGGCCGTCT GAACACCGAG CGCGTGATTG CTTTGGGTGG 

801 TTCTCAAGTC AACAAACCAC GCCTCTTGCG TACCGTTTTG GGTGCGAAAG 

851 TATCGCAAAT TACTGCGGGC GAATTGGTTG ACGCAGACAA CCGCGTGATT 

901 TCCGGTTCGG TATTGAACGG CGCGATTACA CAAGGCGCGC ACGATTATTT 

951 GGGACGCTAC CACAATCAGA TTTCCGTTAT CGAAGAAGGC CGCAGCAAAG 

1001 AGCTGTTCGG CTGGGTTGCG CCGCAGCCGG ACAAATACTC CATCACGCGT 

1051 ACGACCCTCG GCCATTTCCT GAAAAACAAA CTCTTCAAGT TCACGACAGC 

1101 CGTCAACGGT GGCGACCGCG CCATGGTGCC GATTGGTACT TACGAGCGCG 

1151 TAATGCCGCT AGACATCCTG CCTACCCTGC TTTTGCGCGA TTTAATCGTC 

1201 GGCGATACCG ACAGCGCGCA AGCATTGGGT TGCTTGGAAT TGGACGAAGA 

1251 AGACCTCGCT TTGTGCAGCT TCGTCTGCCC GGGCAAATAC GAATANGGCC 

1301 CGCTGTTGCG TAAGGTGCTG GAAACCNTTG AGAAGGAAGG CTGA 

This encodes a protein having amino acid sequence <SEQ ID 128; ORF22a>: 

1 MIKIKKGLNL PIAGRPEQVI YDGPVITEVA LLGEEYAGMR PXMKVKEGDA 

51 VKKGQVLFED KKXPGWFTA PVSGKIAAIH RGEKRVLQSV VIAVEGNDEI 

101 EFERYAPEAL ANLSGXEXXX NLIQSGLWTA LRXRPFSKIP AVDAEPFAIF 

151 VNAMDTNPLA ADPVWIKEA XXDFRRXXLV LSRLTERKIH VCKAAGADVP 

201 SENAANIETH EFGGPHPAGL SGTHIHFIEP VGANKTVWTI NYQDVIAIGR 

251 LFATGRLNTE RVIALGGSQV NKPRLLRTVL GAKVSQITAG ELVDADNRVI 

301 SGSVLNGAIT QGAHDYLGRY HNQISVIEEG RSKELFGWVA PQPDKYSITR 

351 TTLGHFLKNK LFKFTTAVNG GDRAMVPIGT YERVMPLDIL PTLLLRDLIV 

401 GDTDSAQALG CLELDEEDLA LCSFVCPGKY EXGPLLRKVL ETXEKEG* 

The originally-identified partial strain B sequence (ORF22) shows 94.2% identity over a 158aa 
overlap with ORF22a: 

10 20 30 40 50 60 

or f 22 . pep MI KI KKGLN LP I AGRPEQAVYDGPAITE VALLGEE YAGMRPSMKVKEG DAVKKGQVL FE D 
M I I I I I I I I I I I I I I I I I I I I : I I I I I II I I I II I I I I II t I ! i I I I ! II I I I I I I 
orf22a MI KI KKGLN LP IAGRPEQVI YDGPVI TE VALLGEE YAGMRPXMKVKEG DAVKKGQVL FE D 
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70 80 90 100 110 120 

orf 22 . pep ECKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEXNDEIEFERYAPEALANLSGEEVRR 

II I I I I 1 II 1:11 I IMI I I Ml ! IN Mill II I I I I I I I I II I I I I I I I I I I 
orf 22a KKXPGWFTAPVSGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYAPEALANLSGXEXXX 

70 80 90 100 110 120 

130 140 150 

orf 22 .pep NLIQSGLWTALRTRPFSKIPAVDAEPFAI FVNAMDTNP 
M I I I 1 I I I I I I : I I I I I I I I I I I I I I I I I I II I I I I I 
orf 22a NLIQSGLWTALRXRPFSKIPAVDAEPFAI FVNAMDTNPLAADPVWIKEAXXDFRRXXLV 

130 140 150 160 170 180 

The complete strain B sequence (ORF22-1) and ORF22a show 94.9% identity in 447 aa overlap: 

10 20 30 40 , 50 60 

or f 22a . pep MIKIKKGLNLPIAGRPEQVIYDGPVITEVALLGEEYAGMRPXMKVKEGDAVKKGQVLFED 

I I I II I I I I I I II I I I I I : : I I I I : I I I I I I I I I I II I I I I I M I I I I I I II I I I I II I 
orf 22-1 MIKIKKGLN LP I AGRPEQAVYDGPAITEVALLGEE YAGMRPSMKVKEGDAVKKGQVL FE D 

10 20 30 40 50 60 

70 80 90 100 110 120 

or f 22a . pep KKXPGWFTAPVSGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYAPEALANLSGXEXXX 

II I I I II I I I : II I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 1 I I I I I M I I I 
orf 22-1 KKNPGWFTAPASGKIAAIHRGEKRVLQSVVIAVEGNDEIEFERYAPEALANLSGEEVRR 

70 80 90 100 110 120 

130 140 150 160 170 1,80 

orf 22a . pep NLIQSGLWTALRXRPFSKIPAVDAEPFAIFVNAMDTNPLAADPVWIKEAXXDFRRXXLV 
M I I I M I I I 1 I : I M I I I I I I II I I I I 1! I II I I I M I M I I : I : I I M ||:| || 
orf 22-1 NLIQSGLWTALRTRPFSKIPAVDAEPFAI FVNAMDTNPLAADPTVIIKEAAEDFKRGLLV 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 22a . pep LSRLTERKIHVCKAAGADVPSENAANIETHEFGGPHPAGLSGTHIHFIEPVGANKTVWTI 

1 1 i i r 1 1 f 1 1 1 i 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 } i r iiiiiiiiinii 

orf 22-1 LSRLTERKIHVCKAAGADVPSENAANIETHEFGGPHPAGLSGTHIHFIEPVGANKTWTI 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 22a . pep N YQDVI AIGRLFATGRLNTERVI ALGG SQVNKPRLLRT VLGAKVS QI TAGE LVDADNRV I 

I I II 11:11 MM II III llllil I IIIIMI IIMI I llllf III I III I 111:111 I! 
orf 22-1 N YQDVIT I GRLFATGRLNTERVI ALGG SQVNKPRLLRT VLGAKVS Q ITAGE LVDT DNRV I 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 22a . pep SGSVLNGAITQGAHDYLGRYHNQISVIEEGRSKELFGWVAPQPDKYSITRTTLGHFLKNK 
I IIMI II IMIMIIII II IMIII MIMII MUM IIIMI I I Ml Mllllll II 
orf 22-1 SGSVLNGAITQGAHDYLGRYHNQISVIEEGRSKELFGWVAPQPDKYSITRTTLGHFLKNK 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 22a . pep LFKFTTAVNGGDRAMVPIGTYERVMPLDILPTLLLRDLIVGDTDSAQALGCLELDEEDLA 

M I I : i I I M i I I 1 1 II I I I i II 1 I M I IE I 1 I ! I I I II 1 1 I I 1 I I II II I 1 1 I I 

orf 22-1 LFKFNTAVNGGDRAMVPIGTYERVMPLDILPTLLLRDLIVGDTDSAQALGCLELDEEDLA 

370 380 390 400 410 420 

430 440 
orf 22a . pep LCS FVC PGKYEXG PLLRKVLETXEKEGX 

I I I I I II I I I I II II I II II I I I I I I 
orf 22-1 LCSFVCPGKYEYGPLLRKVLETIEKEGX 

430 440 

Further work identified a partial gene sequence <SEQ ID 129> from N.gonorrhoeae, which 
encodes the following amino acid sequence <SEQ ID 130; ORF22ng>: 

1 MIKIKKGLNL PIAGRPEQVI YDGPAITEVA LLGEEYVGMR PSMKIKEGEA 
51 VKKGQVLFED KKNPGWFTA PASGKIAAIH RGEKRVLQSV VIAVEGNDEI 
101 EFERYVPEAL AKLSSEKVRR NLIQSGLWTA LRTRPFSKIP AVDAEPFAIF 
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151 VNAMDTNPLA ADPTVIIKEA AEDFKRGLLV LSRLTERKIH VCKAAGADVP 

201 SENAANIETH EFGGPHPAGL SGTHIHFIEP VGANKTVWTI NYQDVIAIGR 

251 LFVTGRLNTE RWALGGLQV NKPRLLRTVL GAKVSQLTAG ELVDADNRVI 

301 SGSVLNGAIA QGAHDYLGRY HN* 

Further work identified complete gonococcal gene <SEQ ID 1 3 1 >: 

1 ATGATTAAAA TCAAAAAAGG TCTAAATCTG CCCATCGCGG GCAGACCGGA 

51 GCAAGTCATT TATGACGGCC CGGCCATTAC CGAAGTCGCG TTGCTTGGCG 

101 AAGAATATGT CGGCATGCGC CCCTCGATGA AAATCAAGGA AGGTGAAGCC 

151 GTCAAAAAAG GCCAAGTGCT GTTTGAAGAC AAAAAGAATC CGGGCGTAGT 

201 ATTTACTGCG CCGGCTTCAG GCAAAATCGC CGCTATTCAC CGTGGCGAAA 

251 AGCGCGTACT TCAGTCAGTC GTGATTGCCG TTGAAGGCAA CGACGAAATC 

301 GAGTTCGAAC GCTACGTACC TGAAGCGCTG GCAAAATTGA GCAGCGAAAA 

351 AGTGCGCCGC AACCTGATTC AATCAGGCTT ATGGACTGCG CTTCGCACCC 

401 GTCCGTTCAG CAAAATCCCT GCCGTAGATG CCGAGCCGTT CGCCATCTTC 

45l GTCAATGCGA TGGACACCAA TCCGCTGGCT GCCGACCCTA CGGTCATCAT 

501 CAAAGAAGCC GCCGAAGACT TCAAACGCGG CCTGTTGGTA TTGAGCCGCC 

551 TGACCGAACG TAAAATCCAT GTGTGTAAAG CAGCAGGCGC AGACGTGCCG 

601 TCTGAAAATG CTGCCAATAT CGAAACACAT GAATTTGGCG GCCCGCATCC 

651 TGCCGGCTTG AGTGGCACGC ACATTCATTT CATCGAGCCA GTCGGCGCGA 

701 ATAAAACCGT GTGGACCATC AATTATCAAG ACGTGATTGC TATCGGACGT 

751 TTGTTCGTAA CAGGCCGTCT GAATACCGAG CGCGTGGTTG CCTTGGGCGG 

801 CCTGCAAGTC AACAAACCGC GCCTCTTGCG TACCGTTTTG GGTGCGAAGG 

851 TGTCTCAACT TACCGCCGGC GAATTGGTTG ACGCGGACAA CCGCGTGATT 

901 TCCGGTTCGG TATTGAACGG TGCGATTGCA CAAGGCGCGC ATGATTATTT 

951 GGGACGCTAC CACAATCAGA TTTCCGTTAT CGAAGAAGGC CGCAGCAAAG 

1001 AGCTGTTCGG CTGGGTTGCG CCGCAGCCGG ACAAATACTC CATCACGCGC 

1051 ACCACTCTCG GCCATTTCCT AAAAAACAAA CTCTTCAAGT TCACGACAGC 

1101 CGTCAACGGC GGCGACCGCG CCATGGTACC GATCGGCACT TATGAGCGCG 

1151 TAATGCCGTT GGACATCCTG CCTACCTTGC TTTTGCGCGA TTTAATCGTC 

1201 GGCGATACCG ACAGCGCGCA GGCTTTGGGT TGCTTGGAAT TGGACGAAGA 

1251 AGACCTCGCT TTGTGCAGCT TCGTCTGCCC GGGCAAATAC GAATACGGCC 

1301 CGCTGTTGCG CAAAGTGCTG GAAACCATTG AGAAGGAAGG CTGA 

This encodes a protein having amino acid sequence <SEQ ID 132; ORF22ng-l>: 

1 MIKIKKGLNL PIAGRPEQVI YDGPAITEVA LLGEEYVGMR PSMKIKEGEA 

51 VKKGQVLFED KKNPGWFTA PASGKIAAIH RGEKRVLQSV VIAVEGNDEI 

101 EFERYVPEAL AKLSSEKVRR NLIQSGLWTA LRTRPFSKIP AVDAEPFAIF 

151 VNAMDTNPLA ADPTVIIKEA AEDFKRGLLV LSRLTERKIH VCKAAGADVP 

201 SENAANIETH EFGGPHPAGL SGTHIHFIEP VGANKTVWTI NYQDVIAIGR 

251 LFVTGRLNTE RWALGGLQV NKPRLLRTVL GAKVSQLTAG ELVDADNRVI 

301 SGSVLNGAIA QGAHDYLGRY HNQISVIEEG RSKELFGWVA PQPDKYSITR 

351 TTLGHFLKNK LFKFTTAVNG GDRAMVPIGT YERVMPLDIL PTLLLRDLIV 

401 GDTDSAQALG CLELDEEDLA LCSFVCPGKY EYGPLLRKVL ETIEKEG* 

The originally-identified partial strain B sequence (ORF22) shows 93.7% identity over a 158aa 
overlap with ORF22ng: 

orf 22 . pep MIKIKKGLNLP I AGRPEQAVYDG PAITEVALLGEE YAGMRPSMKVKEG DAVKKGQVL FED 60 

MiiltlIlilllillll::lllMIIIII!l||i|:|||||||:|M:||||||MIII 
orf22ng MIKIKKGLNLPIAGRPEQVIYDGPAITEVALLGEEYVGMRPSMKIKEGEAVKKGQVLFED 60 

orf 22 . pep KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEXNDEIEFERYAPEALANLSGEEVRR 120 

t t I I t I I I I I i I I I I 1 I t I I I I I I I I I I t t I 1 I I I IMIMII!:illi|:|l:|:||| 
orf 22ng KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYVPEALAKLSSEKVRR 120 

orf 22 . pep NLIQSGLWTALRTRPFSKIPAVDAEPFAIFVNAMDTNP 158 

UN Ml I IN I I 11 Mi II III I I Ml MM MUM 
o r f 2 2ng NLIQSGLWTALRTRPFSKI PAVDAE PFAI FVNAMDTN PLAAD PTVI IKEAAEDFKRGLLV 180 

The complete sequences from strain B (ORF22-1) and gonococcus (ORF22ng) show 96.2% 
identity in 447 aa overlap: 

10 20 30 40 50 60 

or f 2 2 - 1 . pep MIKIKKGLNLPIAGRPEQAVYDGPAITEVALLGEE YAGMRPSMKVKEGDAVKKGQVLFED 



WO 99/24578 



-126- 



PCT/IB98/01665 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



I II III II I I I I I I I M l::U MM II I I I I I I ! I : I I ! 1 I I I : I I ! : I I I jlllllll 
orf22ng-l MIKIKKGLNLPIAGRPEQVIYDGPAITEVALLGEEYVGMRPSMKIKEGEAVKKGQVLFED 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 22-1 . pep KKNPGWFTAPASGKIAAIHRGEKRVLQSVVIAVEGNDEIEFERYAPEALANLSGEEVRR 
I Ml II IMM II ! I III MMMMII IMIII Ml IMMMIMM llrMMMil 
orf22ng-l KKNPGWFTAPASGKIAAIHRGEKRVLQSVVIAVEGNDEIEFERYVPEALAKLSSEKVRR 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 22-1 . pep NL I QSGLWT ALRTRPFSKI PAVDAE PFAI FVNAMDTN PLAAD PT VI I KEAAE D FKRGLLV 
I I II I I I I I I I II I M M I 1 1 M I M 1 1 M I I M I I I I I I M M I I I M I I I II M I M I 
orf22ng-l NLIQSGLWTALRTRPFSKI PAVDAEPFAI FVNAMDTN PLAAD PTVI I KEAAED FKRGLLV 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 22-1 . pep LSRLTERKIHVCKAAGADVPSENAANIETHEFGGPHPAGLSGTHIHFIEPVGANKTVWTI 
I I I I M I I II II I M I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I II I | 
orf22ng-l LSRLTERKIHVCKAAGADVPSENAANIETHEFGGPHPAGLSGTHIHFIEPVGANKTVWTI 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 22-1 - pep NYQDVITIGRLFATGRLNTERVIALGGSQVNKPRLLRTVLGAKVSQITAGELVDTDNRVI 
llllll:lllll:IMIMIII:llll I M I I I I I I I I I I I I I I i : I I I I I I I : I M I I 
orf22ng-l NYQDVIAIGRLFVTGRLNTERWALGGLQVNKPRLLRTVLGAKVSQLTAGELVDADNRVI 

250 260 270 280 290 \ 300 

310 320 330 340 350 360 

or f 22-1 . pep SGSVLNGAITQGAHDYLGRYHNQISVIEEGRSKELFGWVAPQPDKYSITRTTLGHFLKNK 
I I I I I II I I : I I I I I I I I I I I I I I I I I I I II I I I I I M I I I I I I I I I I I I I I I I I I I II I 
orf22ng-l SGSVLNGAIAQGAHDYLGRYHNQISVIEEGRSKELFGWVAPQPDKYSITRTTLGHFLKNK 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 22-1 . pep LFKFNTAVNGGDRAMVPIGTYERVMPLDILPTLLLRDLIVGDTDSAQALGCLELDEEDLA 
I I I I : I M I It I I I M I I j I I I I 1 1 I 1 1 I f I I j I I I I I I I I E I I 1 1 I I I I i I M f M I M 
orf22ng-l LFKFTTAVNGGDRAMVPIGTYERVMPLDILPTLLLRDLIVGDTDSAQALGCLELDEEDLA 

370 380 390 400 410 420 

430 440 
orf 22-1 . pep LCS FVCPGKYEYGPLLRKVLETIEKEGX 
I I I I ! I I I I I I I I I II I I I I I I II I I I I 
orf22ng-l LCS FVCPGKYEYGPLLRKVLETIEKEGX 

430 440 

Computer analysis of these sequences gave the following results: 

Homology with 48kDa outer membrane protein of Actinobacillus pleuropneumoniae (accession number U24492). 
ORF22 and this 48kDa protein show 72% aa identity in 1 58aa overlap: 

MIKIKKGLNLPIAGRPEQAVYDGPAITEVALLGEEYAGMRPSMKVKEGDAVKKGQVLFED 60 
MI IKKGL+LPIAG P Q +++G + EVA+LGEEY GMRPSMKV+EGD VKKGQVLFED 
MITIKKGLDLPIAGTPAQVIHNGNTVNEVAMLGEEYVGMRPSMKVREGDWKKGQVLFED 60 

KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEXNDEIEFERYAPEALANLSGEEVRR 120 
KKNPGWFTAPASG + I+RGEKRVLQSWI VE +++I F RY LA+LS E+V++ 
KKNPGWFTAPASGTWTINRGEKRVLQSWIKVEGDEQITFTRYEAAQLASLSAEQVKQ 120 

NL I QSGLWT ALRTRP FSKI PAVDAE PFAI FVNAMDTNP 158 
NLI+SGLWTA RTRPFSK+PA+DA P +1 FVNAMDTNP 



ORF22a also shows homology to the 48kDa Actinobacillus pleuropneumoniae protein: 

gi 1 1185395 (U24492) 48 kDa outer membrane protein [Actinobacillus pleuropneumoniae] 
Length =44 9 
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Score = 530 bits (1351) , Expect = e-150 
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Identities = 274/450 (60%), Positives - 323/450 (70%), Gaps = 4/450 (0%) 

Query: 1 MIKIKKGLNLPIAGRPEQVIYDGPVITEVALLGEEYAGMRPXMKVKEGDAVKKGQVLFED 60 

MI IKKGL+LPIAG P QVI++G + EVA+LGEEY GMRP MKV+EGD VKKGQVLFED 
Sbjct: 1 MIT IKKGLDL P I AGT PAQV I HNGNTVNEVAMLGEE YVGMRPSMKVREGDWKKGQVLFE D 60 

Query: 61 KKXPGWETAPVSGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYAPEALANLSGXEXXX 120 

KK PGWFTAP SG + I +RGEKRVLQS WI VEG+++I F RY LA+LS + 
Sbjct: 61 KKN PG WFTAPASGTWT INRGEKRVLQS WI KVEGDEQ IT FTRYEAAQLAS LSAEQVKQ 120 

Query: 121 NLIQSGLWTALRXRPFSKIPAVDAEPFAI FVNAMDTNPLAADPWVIKEAXXDFRRXXLV 180 

NLI+SGLWTA R RPFSK+PA+DA P +IFVNAMDTNPLAADP W+KE DF+ V 
Sbjct: 121 NLIESGLWTAFRTRPFSKVPALDAIPSSIFVNAMDTNPIAADPEVVLKEYETDFKDGLTV 180 

Query: 181 LSRL — TERKIHVCKAAGADVP-SENAANIETHEFGGPHPAGLSGTHIHFIEPVGANKTV 237 

L+RL ++ +++CK A +++P S I F G HPAGL GTHIHF++PVGA K V 

Sbjct: 181 LTRLFNGQKPVYLCKDADSNIPLSPAIEGITIKSFSGVHPAGLVGTHIHFVDPVGATKQV 240 

Query: 238 WT IN YQDVIAI GRL FATGRLNTERVI ALGGSQVNKPRLLRT VLGAKVSQ ITAGE LVDADN 297 

W +N YQDVIAI G+L F TG L T+R+I+L G QV PRL+RT LGA +SQ+TA EL +N 
Sbjct: 241 WHLNYQDVIAIGKLFTTGELFTDRIISLAGPQVKNPRLVRTRLGANLSQLTANELNAGEN 300 

Query: 298 RVISGSVLNGAITQGAHDYLGRYHNQISVIEEGRSKELFGWVAPQPDKYSITRTTLGHFL 357 

RVISGSVL+GA G DYLGRY Q+SV+ EGR KELFGW+ P DK+SITRT LGHF 
Sbjct: 301 RVISGSVLSGATAAGPVDYLGRYALQVSVLAEGREKELFGWIMPGSDKFSITRTVLGHFG 360 

Query: 358 KNKLFKFTTAVNGGDRAMVPIGTYERVMXXXXXXXXXXXXXXVGDTDSAQXXXXXXXXXX 417 

K KLF FTTAV+GG+RAMVPIG YERVM GDTDSAQ 
Sbjct: 361 K-KLFNFTTAVHGGERAMVPIGAYERVMPLDIIPTLLLRDLAAGDTDSAQNLGCLELDEE 419 

Query: 418 XXXXXS FVC PGKYEXG PLLRKVLETXEKEG 447 
++VCPGK GP+LR LE EKEG 

ORF22ng-l also shows homology with the OMP from A.pleuropneumoniae: 

gil 1185395 (U24492) 48 kDa outer membrane protein [Actinobacillus 
pleuropneumoniae) Length - 449 
Score - 555 bits (1414), Expect - e-157 

Identities = 284/450 (63%), Positives - 337/450 (74%), Gaps « 4/450 (0%) 

Query: 27 MIKIKKGLNLPIAGRPEQVIYDGPAITEVALLGEEYVGMRPSMKIKEGEAVKKGQVLFED 86 

MI IKKGL+LPIAG P QVI++G + EVA+LGEEYVGMRPSMK++EG+ VKKGQVLFED 
Sbjct: 1 M I T I KKGLDLPIAGTPAQVI HNGNTVNEVAMLGEE YVGMR P SMKVREG DWKKGQVLFE D 60 

Query: 87 KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYVPEALAKLSSEKVRR 14 6 

KKNPGWFTAPASG + I +RGEKRVLQS W I VEG+++I F RY LA LS+E+V++ 
Sbjct: 61 KKN PG WFTAPASGTWT INRGEKRVLQS W I KVEGDEQIT FTRYEAAQLAS LSAEQVKQ 120 

Query: 147 NLIQSGLWTALRTRPFSKI PAVDAE PFAI FVNAMDTNPLAADPTVI IKEAAEDFKRGLLV 206 

NLI+SGLWTA RTRP FSK+ PA+DA P + 1 FVNAMDTN PLAADP V++KE DFK GL V 
Sbjct: 121 NLIESGLWTAFRTRPFSKVPALDAIPSSIFVNAMDTNPLAADPEWLKEYETDFKDGLTV 180 

Query: 207 LSRL — TERKIHVCKAAGADVP-SENAANIETHEFGGPHPAGLSGTHIHFIEPVGANKTV 263 

L+RL ++ +++CK A +++P S I F G HPAGL GTHIHF++PVGA K V 

Sbjct: 181 LTRLFNGQKPVYLCKDADSNIPLSPAIEGITIKSFSGVHPAGLVGTHIHFVDPVGATKQV 240 

Query: 264 WTINYQDVIAIGRLFVTGRLNTERWALGGLQVNKPRLLRTVLGAKVSQLTAGELVDADN 323 

W +NYQDVIAIG+LF TG L T+R+++L G QV PRL+RT LGA +SQLTA EL +N 
Sbjct: 241 WHLNYQDVIAIGKLFTTGELFTDRI I SLAGPQVKNPRLVRTRLGANLSQLTANELNAGEN 300 

Query: 324 RVISGSVLNGAIAQGAHDYLGRYHNQISVIEEGRSKELFGWVAPQPDKYSITRTTLGHFL 383 

RVISGSVL+GA A G DYLGRY Q+SV+ EGR KELFGW+ P DK+SITRT LGHF 
Sbjct: 301 RVISGSVLSGATAAGPVDYLGRYALQVSVLAEGREKELFGWIMPGSDKFSITRTVLGHFG 360 

Query: 384 KNKLFKFTTAVNGGDRAMVPIGTYERVMXXXXXXXXXXXXXXVGDTDSAQXXXXXXXXXX 443 

K KLF FTTAV+GG+RAMVPIG YERVM GDTDSAQ 
Sbjct: 361 K-KLFNFTTAVHGGERAMVPIGAYERVMPLDI I PTLLLRDLAAGDTDSAQNLGCLELDEE 419 

Query: 444 XXXXXS FVC PGKYEYGPLLRKVLETIEKEG 473 

++VCPGK YGP+LR LE IEKEG 
Sbjct: 420 DLALCTYVCPGKNNYGPMLRAALEKIEKEG 449 
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Based on this analysis, including the homology with the outer membrane protein of Actinobacillus 
pleuropneumoniae, it was predicted that these proteins from Kmeningitidis and K gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF22-1 (35.4kDa) was cloned in pET and pGex vectors and expressed in Ecoli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
5A shows the results of affinity purification of the GST-fusion protein, and Figure 5B shows the 
results of expression of the His-fusion in Ecoli. Purified GST-fusion protein was used to immunise 
mice, whose sera were used for ELISA (positive result) and FACS analysis (Figure 5C). These 
experiments confirm that ORF22-1 is a surface-exposed protein, and that it is a useful immunogen. 

Example 16 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 133>: 

1 ..GCGnCGnAAA TCATCCATCC CC.nACGTC GTAGGCCCTG AAGCCAACTG 

51 GTTTTTTATG GTAGCCAGTA CGTTTGTGAT TGCTTTGATT GGTTATTTTG 

101 TTACTGAAAA AATCGTCGAA CCGCAATTGG GCCCTTATCA ATCAGATTTG 

151 TCACAAGAAG AAAAAGACAT TCGGCATTCC AATGAAATCA CGCCTTTGGA 

201 ATATAAAGGA TTAATTTGGG CTGGCGTGGT GTTTGTTGCC TTATCCGCCC | 

251 TATTGGCTTG GAGCATCGTC CCTGCCGACG GTATTTTGCG TCATCCTGAA 

301 ACAGGATTGG TTTCCGGTTC GCCGTTTTTA AAATCGATTG TTGTTTTTAT 

351 TTTCTTGTTG TTTGCACTGC CGGGCATTGT TTATGGCCGG GTAACCCGAA 

401 GTTTGCGCGG CGAACAGGAA GTCGTTAATG CGmyGGCCGA ATCGATGAGT 

4 51 ACTCTGGsGC TTTmTTTGsw CAkcATCTTT TTTGCCGCAC AGTTTGTCGC 

501 ATTTTTTAAT TGGACGAATA TTGGGCAATA TATTGCCGTT AAAGGGGCGA 

551 CGTTCTTAAA AGAAGTCGGC TTGGGCGGCA GCGTGTTGTT TATCGGTTTT 

601 ATTTTAATTT GTGCTTTTAT CAATCTGATG ATAGGCTCCG CCTCCGCGCA 

651 ATGGGCGGTA ACTGCGCCGA TTTTCGTCCC TATGCTGATG TTGGCCGGCT 

701 ACGCGCCCGA AGTCATTCAA GCCGCTTACC GCATCGGTGA TTCCGTTACC 

751 AATATTATTA CGCCGATGAT GAGTTATTTC GGGCTGATTA TGGCGACGGT 

801 GrkCrmnmTAC AAAAAAGATG CGGGCGTGGG TaCGcTGATT wCTATGATGT 

851 TGCCGTATTC CGCTTTCTTC TTGATTGCgT GGATTGCCTT ATTCTGCATT 

901 TGGGTATTTg TTTTGGGCCT GCCCGTCGGT CCCGGCGCGC CCACATTCTA 

951 TCCCGCACCT TAA 

This corresponds to the amino acid sequence <SEQ ID 134; ORF12>: 

1 . .AXXIIHPXXV VGPEANWFFM VASTFVIALI GYFVTEKIVE PQLGPYQSDL 

51 SQEEKDIRHS NEITPLEYKG LIWAGWFVA LSALLAWSIV PADGILRHPE 

101 TGLVSGSPFL KSIWFIFLL FALPGIVYGR VTRSLRGEQE WNAXAESMS 

151 TLXLXLXXIF FAAQFVAFFN WTNIGQYIAV KGATFLKEVG LGGSVLFIGF 

201 ILICAFINLM IGSASAQWAV TAPIFVPMLM LAGYAPEVIQ AAYRIGDSVT 

251 NIITPMMSYF GLIMATVXXY KKDAGVGTLI XMMLPYSAFF LIAWIALFCI 

301 WVFVLGLPVG PGAPTFYPAP * 

Further sequence analysis revealed the complete DNA sequence <SEQ ID 135> to be: 

1 ATGAGTCAAA CCGATACGCA ACGGGACGGA CGATTTTTAC GCACAGTCGA 

51 ATGGCTGGGC AATATGTTGC CGCATCCGGT TACGCTTTTT ATTATTTTCA 

101 TTGTGTTATT GCTGATTGCC TCTGCCGTCG GTGCGTATTT CGGACTATCC 

151 GTCCCCGATC CGCGCCCTGT TGGTGCGAAA GGACGTGCCG ATGACGGTTT 

201 GATTTACATT GTCAGCCTGC TCAATGCCGA CGGTTTTATC AAAATCCTGA 

251 CGCATACCGT TAAAAATTTC ACCGGTTTCG CGCCGTTGGG AACGGTGTTG 

301 GTTTCTTTAT TGGGCGTGGG GATTGCGGAA AAATCGGGCT TGATTTCCGC 

351 ATTAATGCGC TTATTGCTCA CAAAATCGCC ACGCAAACTC ACTACTTTTA 

401 TGGTTGTTTT TACAGGGATT TTATCTAATA CCGCTTCTGA ATTGGGCTAT 

451 GTCGTCCTAA TCCCTTTGTC CGCCATCATC TTTCATTCCC TCGGCCGCCA 

501 TCCGCTTGCC GGTCTGGCTG CGGCTTTCGC CGGCGTTTCG GGCGGTTATT 
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551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 



CGGCCAATCT. 
CAACAGGCGG 
CAACTGGTTT 
ATTTTGTTAC 
GATTTGTCAC 
TTTGGAATAT 
CCGCCCTATT 
CCTGAAACAG 
TTTTATTTTC 
CCCGAAGTTT 
ATGAGTACTC 
TGTCGCATTT 
GGGCGACGTT 
GGTTTTATTT 
CGCGCAATGG 
CCGGCTACGC 
GTTACCAATA 
GACGGTGATC 
TGATGTTGCC 
TGCATTTGGG 
ATTCTATCCC 



GTTCTTAGGC 
CGCAAATCAT 
TTTATGGTAG 
TGAAAAAATC 
AAGAAGAAAA 
AAAGGATTAA 
GGCTTGGAGC 
GATTGGTTTC 
TTGTTGTTTG 
GCGCGGCGAA 
TGGGGCTTTA 
TTTAATTGGA 
CTTAAAAGAA 
TAATTTGTGC 
GCGGTAACTG 
GCCCGAAGTC 
TTATTACGCC 
AAATACAAAA 
GTATTCCGCT 
TATTTGTTTT 
GCACCTTAA 



ACAATCGATC 
CCATCCCGAC 
CCAGTACGTT 
GTCGAACCGC 
AGACATTCGG 
TTTGGGCTGG 
ATCGTCCCTG 
CGGTTCGCCG 
CACTGCCGGG 
CAGGAAGTCG 
TTTGGTCATC 
CGAATATTGG 
GTCGGCTTGG 
TTTTATCAAT 
CGCCGATTTT 
ATTCAAGCCG 
GATGATGAGT 
AAGATGCGGG 
TTCTTCTTGA 
GGGCCTGCCC 



CGCTCTTGGC 
TACGTCGTAG 
TGTGATTGCT 
AATTGGGCCC 
CATTCCAATG 
CGTGGTGTTT 
CCGACGGTAT 
TTTTTAAAAT 
CATTGTTTAT 
TTAATGCGAT 
ATCTTTTTTG 
GCAATATATT 
GCGGCAGCGT 
CTGATGATAG 
CGTCCCTATG 
CTTACCGCAT 
TATTTCGGGC 
CGTGGGTACG 
TTGCGTGGAT 
GTCGGTCCCG 



AGGCATCACC 
GCCCTGAAGC 
TTGATTGGTT 
TTATCAATCA 
AAATCACGCC 
GTTGCCTTAT 
TTTGCGTCAT 
CGATTGTTGT 
GGCCGGGTAA 
GGCCGAATCG 
CCGCACAGTT 
GCCGTTAAAG 
GTTGTTTATC 
GCTCCGCCTC 
CTGATGTTGG 
CGGTGATTCC 
TGATTATGGC 
CTGATTTCTA 
TGCCTTATTC 
GCGCGCCCAC 



This corresponds to the amino acid sequence <SEQ ID 136; ORF12-l>: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 



MSQTDTQRDG RFLRTVEWLG 
VPDPRPVGAK GRADDGLIYI 



VSLLGVGIAE KSGLISALMR 
WLIPLSAII FHSLG RHPLA 
QQAAQIIHPD YWGPEANWF_ 
DLSQEEKDIR HSNEITPLEY 
PETGLVSGSP FLKS IWFIF 
MST LGLYLVI IFFAAQFVAF 
GFILICAFIN LMI GSASAQW 
VTN IITPMMS YFGLIMATVI 
CIWVFVLGLP VGPGAPTFYP 



NMLPHP VTLF IIFIVLLLIA SAVG AYFGLS 
VSLLNADGFI KIL THTVKNF TG FAPLGTVL 
LLLTKSPRKL TTFMWFTGI 
GLAAAFAGVS GGYSANLFLG 
FMVASTFVIA LIGYFV TEKI 
KGLIW AGWF VALSALLAWS 
LLFALPGIVY GRVTRSLRGE 



FNWTNIGQYI AVKGATFLKE 
AVTAPIFVPM LMLAGYA PEV 
KYKKDAGVGT LISMMLPYSA 
AP* 



LSNTASELGY 
TIDPLLAGIT 
VEPQLGPYQS 
IVPADGILRH 
QEWNAMAES 
VGLGGS VLFI 
IQAAYRIGDS 
FFLIAWIALF 



Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF12 shows 96.3% identity over a 320aa overlap with an ORF (ORF 12a) from strain A of N. 
meningitidis: 

10 20 30 

orf 12 . pep AXXIIHPXXWGPEANWFFMVASTFVIALI 

I fill I I I I I I I I I I I I I I I I I I I I I 
orf 12a AAAFAGVSGGYSANLFLGTIDPLLAGITQQAAQIIHPDYWGPEANWFFMVASTFVIALI 
180 190 200 210 220 230 

40 50 60 70 80 90 

orf 12 . pep GYFVTEKIVEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIV 
I I I I I I I I I i I I I I I I I I I I I I I I I I I I II I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 12a GYFVTEKIVEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIV 
240 250 260 270 280 290 

100 110 120 130 140 150 

orf 12 . pep PADGILRHPETGLVSGSPFLKSIWFIFLLFALPGIVYGRVTRSLRGEQEWNAXAESMS 

I ! I ! I I I M i I I 1 I I I I I I I I I I I I I I I I I I I II I I I! M I I I I I I I I t 1 I I I I Mill 
orf 12a PADGILRHPETGLVSGSPFLKSIWFIFLLFALPGIVYGRVTRSLRGEQEWNAMAESMS 

300 310 320 330 340 350 

160 170 180 190 200 210 

orf 12. pep T LXLXLXXI FFAAQFVAFFNWTN IGQY I AVKGATFLKEVGLGG S VLFIGFI L ICAFINLM 

II I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I II I I I I I I I M II | | | | | | I | | | 
orf 12a TLGLYLVIIFFAAQFVAFFNWTNIGQYIAVKGATFLKEVGLGGSVLFIGFILICAFINLM 

360 370 380 390 400 410 

220 230 240 250 260 270 

orf 12 . pep IGSASAQWAVTAPI FVPMLMLAGYAPEVIQAAYRIGDSVTNI ITPMMS YFGLIMATVXXY 
I 1 1 1 1 I 1 1 1 1 1 1 1 1 I J I I i I I I I I I 1 1 i I 1 1 f I I i i 1 I ! 1 I i i I I I I J I ! I 1 1 1 1 I ! I 
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orfl2a 



IGSASAQWAVTAPIFVPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKY 
420 430 440 450 460 470 



280 290 300 310 320 

or f 12 . pep KKDAGVGTLIXMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAPX 
I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I I I I I I 
orfl2a KKDAGVGT LI SMMLPYS AFFLI AWI ALFC IWVFVLGLPVG PGAPT FY PAPX 

480 490 500 510 520 

The complete length ORF12a nucleotide sequence <SEQ ID 137> is: 



1 


A rCiAbl CAAA 


51 


ATGGCTGGGC 


1U1 


TTGTGTTATT 


1 CI 


GrCCCCGAIL 


ZU1 


G ATT CACGTT 


251 


CGCATACCGT 


301 


GTTTCTTTAT 


'act 

351 


ATT AAT GCGC 


401 


TGGTTGTTTT 


451 


GTCGTCCTAA 


501 


TCCGCTTGCC 


551 


CGGCCAATCT 


601 


CAACAGGCGG 


651 


CAACTGGTTT 


701 


ATTTTGTTAC 


751 


GATTTGTCAC 


801 


TTTGGAATAT 


851 


CCGCCCTATT 


901 


CCTGAAACAG 


951 


TTTTATTTTC 


1001 


CCCGAAGTTT 


1051 


ATGAGTACTC 


1101 


TGTCGCATTT 


1151 


GGGCGACGTT 


1201 


GGTTTTATTT 


1251 


CGCGCAATGG 


1301 


CCGGCTACGC 


1351 


GTTACCAATA 


1401 


GACGGTGATC 


1451 


TGATGTTGCC 


1501 


TGCATTTGGG 


1551 


ATTCTATCCC 



CCGATACGCA 
AATATGTTGC 
GCTGATTGCC 
CGCGCCCTGT 
GTCAGCCTGC 
TAAAAATTTC 
TGGGCGTGGG 
TTATTGCTCA 
TACAGGGATT 
TCCCTTTGTC 
GGTCTGGCTG 
GTTCTTAGGC 
CGCAAATCAT 
TTTATGGTAG 
TGAAAAAATC 
AAGAAGAAAA 
AAAGGATTAA 
GGCTTGGAGC 
GATTGGTTTC 
TTGTTGTTTG 
GCGCGGCGAA 
TGGGGCTTTA 
TTTAATTGGA 
CTTAAAAGAA 
TAATTTGTGC 
GCGGTAACTG 
GCCCGAAGTC 
TTATTACGCC 
AAATACAAAA 
GTATTCCGCT 
TATTTGTTTT 
GCACCTTAA 



ACGGGACGGA 
CGCACCCGGT 
TCTGCCGCCG 
TGGTGCGAAA 
TCGATGCTGA 
ACCGGTTTCG 
GATTGCGGAA 
CAAAATCTCC 
TTATCTAATA 
CGCCATCATC 
CGGCTTTCGC 
ACAATCGATC 
CCATCCCGAC 
CCAGTACGTT 
GTCGAACCGC 
AGACATTCGA 
TTTGGGCTGG 
ATCGTCCCTG 
CGGTTCGCCG 
CACTGCCGGG 
CAGGAAGTCG 
TTTGGTCATC 
CGAATATTGG 
GTCGGCTTGG 
TTTTATCAAT 
CGCCGATTTT 
ATTCAAGCCG 
GATGATGAGT 
AAGATGCGGG 
TTCTTCTTGA 
GGGCCTGCCC 



CGATTTTTAC 
TACGCTTTTT 
GTGCGTATTT 
GGACGTGCCG 
CGGTTTGATC 
CGCCGTTGGG 
AAATCGGGCT 
ACGCAAACTC 
CCGCTTCTGA 
TTTCATTCCC 
CGGCGTTTCG 
CGCTCTTGGC 
TACGTCGTAG 
TGTGATTGCT 
AATTGGGCCC 
CATTCCAATG 
CGTGGTGTTT 
CCGACGGTAT 
TTTTTAAAAT 
CATTGTTTAT 
TTAATGCGAT 
ATCTTTTTTG 
GCAATATATT 
GCGGCAGCGT 
CTGATGATAG 
CGTCCCTATG 
CTTACCGCAT 
TATTTCGGGC 
CGTGGGTACG 
TTGCGTGGAT 
GTCGGTCCCG 



GCACAGTCGA 
ATTATTTTCA 
CGGACTATCC 
ATGACGGTTT 
AAAATCCTGA 
AACGGTGTTG 
TGATTTCCGC 
ACTACTTTTA 
ATTGGGCTAT 
TCGGCCGCCA 
GGCGGTTATT 
AGGCATCACC 
GCCCTGAAGC 
TTGATTGGTT 
TTATCAATCA 
AAATCACGCC 
GTTGCCTTAT 
TTTGCGTCAT 
CAATTGTTGT 
GGCCGGGTAA 
GGCCGAATCG 
CCGCACAGTT 
GCCGTTAAAG 
GTTGTTTATC 
GCTCCGCCTC 
CTGATGTTGG 
CGGTGATTCC 
TGATTATGGC 
CTGATTTCTA 
TGCCTTATTC 
GCGCGCCCAC 



This encodes a protein having amino acid sequence <SEQ ID 138>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 



MSQTDTQRDG RFLRTVEWLG 
VPDPRPVGAK GRADDGLIHV 



VSLLGVGIAE KSGLISALMR 
WLIPLSAII FHSLG RHPLA 
QQAAQI IHPD YWGPEANWF_ 
DLSQEEKDIR HSNEITPLEY 
PETGLVSGSP FLKSIWFIF 



MST LGLYLVI IFFAAQFVAF 
GFILICAFIN LMI GSASAQW 
VTN IITPMMS YFGLIMATVI 
CIWVFVLGLP VGPGAPTFYP 



NMLPHP VTLF IIFIVLLLIA 
VSLLDADGLI KIL THTVKNF 
LLLTKSPRKL TTFMWFTGI 
GLAAAFAGVS GGYSANLFLG 
FMVASTFVIA LIGYFVT EKI 
KGLIW AGWF VALSALLAWS 
LLFALPGIVY G RVTRSLRGE 
FNWTNIGQYI AVKGATFLKE 
AVTAPIFVPM LMLAGYA PEV 
KYKKDAGVGT LISMMLPYSA 
AP* 



SAAGAYFGLS 
TGFAPLGTVL 
LSNTASELGY 
TIDPLLAGIT 
VEPQLGPYQS 
IVPADGILRH 
QEWNAMAES 
VGLGGS VLFI 
IQAAYRIGDS 
FFLIAWIALF 



ORF12a and ORF12-1 show 99.0% identity in 522 aa overlap: 



10 20 30 40 50 60 

orf 12a . pep MSQTDTQRDGRFLRTVEWLGNMLPHPVTLFI I FIVLLL I AS AAGAYFGLS VPDPRPVGAK 

I lllllllltll II IMMMIIIMII I I I II I 1 111 1111:111 II I I illlllll II 
orf 12-1 MSQTDTQRDGRFLRTVEWLGNMLPHPVTLFI I FIVLLLIASAVGAYFGLS VPDPRPVGAK 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 12a . pep GRADDGLIHWSLLDADGLIKILTHTVKNFTGFAPLGTVLVSLLGVGIAEKSGLISALMR 
I I i I I I It:: I I I I : I I I: M I I I I I I I I I I I I I I M I I I I I 11 I I M I I I I I I I I I I M 
orf 12-1 GRADDGLIYIVSLLNADGFIKILTHTVKNFTGFAPLGTVLVSLLGVGIAEKSGLISAI^dR 
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70 



80 



90 



100 



110 



120 



10 



15 



20 



25 



30 



35 



40 



130 140 150 160 170 180 

orfl2a.pep IJiLTKSPRKLTTEHWFTGILSNTASELGYVVLIPLSAIIETiSLGRHPLAGLAAAFAGVS 

II I I I I I I I I I ! I I I I I I I I I I I MliilllltlMllllll III II II llllll 

orfl2-l LLLTKS PRKLTT FMWFTG I LSNTASELG YWLI PLSAI I FHSLGRHPLAGLAAAFAGVS 

130 140 150 160 170 180 

190 200 210 220 230 240 

orfl2a.pep GGYSANLFLGTIDPLLAGITQQAAQIIHPDYWGPEANWFFMVASTFVIALIGYFVTEKI 
llllllllll MINI MINIM Ml Mllllil MM II II MM Mill MINIM 
orfl2-l GGYSANLFLGTIDPLLAGITQQAAQIIHPDYWGPEANWFFMVASTFVIALIGYFVTEKI 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 12a . pep VEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWEVALSALLAWSIVPADGILRH 
II I I N I N I N I I II I I II I II I I I I II I I I II I N II I I I II I II I I II I I I I I I I I I 
orf 12-1 VEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIVPADGILRH 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 12a. pep PETGLVSGSPFLKSIWFIFLLFALPGIVYGRVTRSLRGEQEWNAMAESMSTLGLYLVI 
I I I 1 1 1 1 M I M I M I 1 1 I I M I I I M I II 1 1 I 1 1 M 1 1 I I I I II II I I I I M I 1 1 1 1 I I 
orf 12-1 PETGLVSGS PFLKS I WFI FLLFALPGI VYGRVTRSLRGEQE WNAMAESMSTLGLYLVI 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 12a . pep IFFAAQFVAFFNWTNIGQYIAVKGATFLKEVGLGGSVLFIGFILICAFINLMIGSASAQW 
I II N I II II I I II N I I N I N I I II I II II I I M N II I II I II I I N 1 1 II I N N I 
orf 12-1 IFFAAQFVAFFNWTNIGQYIAVKGATFLKEVGLGGSVLFIGFILICAFINLMIGSASAQW 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf 12a. pep AVTAPIFVPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKYKKDAGVGT 
I 1 1 1 1 I II 1 1 1 1 II I I 1 1 II I II I II I I I I I I I 1 1 I II I I M I II II I II I I II I N I I I 
orf 12-1 AVTAPI FVPMLMLAGYAPE VTQAAYRIG DS VTN 1 1 T PMMS YFGL IMATV I KYKKDAGVGT 

430 440 450 460 470 480 

490 500 510 520 

orf 12a . pep LISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAPX 
M II I I I I I II I II II II II If I I I II II I II I I I I I II I I N 
orf 12-1 LISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAPX 

490 500 510 520 



45 Homology with a predicted ORF from K gonorrhoeae 

ORF12 shows 92.5% identity over a 320aa overlap with a predicted ORF (ORF12.ng) from N. 
gonorrhoeae: 



50 



55 



60 



65 



orf 12. pep AXXIIHPXXWGPEANWFFMVASTFVIALI 30 

I INI IIIIIMIIIhlllllllll 

orfl2ng AAAFAGVSGGYSANLFLGTIDPLLAGITQQAAQIIHPDYWGPEANWFFMAASTFVIALI 232 

orf 12 . pep GYFVTEKIVEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIV 90 

I I M I I M I M 1 1 1 M 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 M 1 1 1 I 1 1 M I 1 1 1 1 1 I I II 1 1 1 M I 

orfl2ng GYFVTEKIVEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIV 292 

orf 12 .pep PADGILRHPETGLVSGSPFLKSIWFIFLLFALPGIVYGRVTRSLRGEQEWNAXAESMS 150 

Nllllllllll 11:11 II I I III i II I I I I II I: I I I Ml i: II I II II II I 

orf!2ng PADGILRHPETGLVAGSPFLKSIWFIFLLFALPGIVYGRITRSLRGEREWNAMAESMS 352 

orf 12 .pep TLXLXLXXIFFAAQFVAFFNWTNIGQYIAVKGATFIJCEVGLGGSVLFIGFILICAFINLM 210 

till I II I I I I I M I I I I M I I I I t I I I I : M I : I I II I II II I II N II I II I 

orf 12ng TLGLYLVIIFFAAQFVAFFNWTNIGQYIAVKGAVFLKKFRLGGSVLFIGFILIGAFINLM 4 12 

orf 12 .pep IGSASAQWAVTAP I FVPMLMLAGYAPEVIQAAYRIGDSVTN I ITPMMSYFGLIMATVXXY 270 

I I M I M N I I M II I II II N I I I r | | 1 1 K | i I | | 1 1 | | | | 1 1 | | | | | 1 1 f I 1 1 1 I 

0rfl2ng IGSASAQWAVTAPIFVPMLMLAGNAPQVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKY 472 



WO 99/24578 



-132- 



PCT/IB98/01665 



orf 12 .pep KKDAGVGTLIXMMLPYSAFEXIAWIALFCIWVFVLGLPVGPGAPTFYPAP 320 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I t : I 
orf!2ng KKDAGVGTLISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGTPTFYPVP 522 

The complete length ORF12ng nucleotide sequence <SEQ ID 139> is: 

1 ATGAGTCAAA CCGACGCGCG TCGTAGCGGA CGATTTTTAC GCACAGTCGA 

51 ATGGCTGGGC AATATGTTGC CGCACCCGGT TACGCTTTTT ATTATTTTCA 

101 TTGTGTTATT GCTGATTGcc tctgCCGTCG GTGCGTATTT CGGACTATCC 

151 GTCCCCGATC CGCGTCCTGT TGGGGCGAAA GGACGTGCCG ATGACGGTTT 

201 GATTCACGTT GTCAGCCTGC TCGATGCCGA CGGTTTGATC AAAATCCTGA 

251 CGCATACCGT TAAAAATTTC ACCGGTTTCG CGCCGTTGGG AACGGTGTTG 

301 GTTTCTTTAT TGGGCGTGGG GATTGCGGAA AAATCGGGCT TGATTTCCGC 

351 ATTAATGCGC TTATTGCTCA CAAAATCCCC ACGCAAACTC ACTACTTTTA 

401 TGGTTGTTTT TACAGGGATT TTATCCAATA CGGCTTCTGA ATTGGGCTAT 

451 GTCGTCCTAA TCCCTTTGTC CGCCGTCATC TTTCATTCGC TCGGCCGCCA 

501 TCCGCTTGCC GGTTTGGCTG CGGCTTTCGC CGGCGTTTCG GGCGGTTATT 

551 CGGCCAATCT GTTCTTAGGC ACAATCGATC CGCTCTTGGC AGGCATCACC 

601 CAACAGGCGG CGCAAATCAT CCATCCCGAC TACGTCGTAG GCCCTGAAGC 

651 CAACTGGTTT TTTATGGCAG CCAGTACGTT TGTGATTGCT TTGATTGGTT 

701 ATTTTGTTAC TGAAAAAATC GTCGAACCGC AATTGGGCCC TTATCAATCA 

751 GATTTGTCAC AAGAAGAAAA AGACATTCGG CATTCCAATG AAATCACGCC 

801 TTTGGAATAT AAAGGATTAA TTTGGGCAGG CGTGGTGTTT GTTGCCTTAT 

851 CCGCCCTATT GGCTTGGAGC ATCGTCCCTG CCGACGGTAT TTTGCGTCAT 

901 CCTGAAACAG GATTGGTTGC CGGTTCGCCG TTTTTAAAAT CGATTGTTGT 

951 TTTTATTTTC TTGTTGTTTG CGCTGCCGGG CATTGTTTAT GGCCGGATAA 

1001 CCCGAAGTTT GCGCGGCGAA CGGGAAGTCG TTAATGCGAT GGCCGAATCG I 

1051 ATGAGTACTT TGGGACTTTA TTTGGTCATC ATCTTTTTTG CCGCACAGTT 

1101 TGTCGCATTT TTTAATTGGA CGAATATTGG GCAATATATT GCCGTTAAAG 

1151 GGGCGGTGTT CTTAAAAGAA GTCGGCTTGG GCGGCAGTGT GTTGTTTATC 

1201 GGTTTTATTT TAATTTGTGC TTTTATCAAT CTGATGATAG GCTCCGCCTC 

1251 CGCGCAATGG GCGGTAACTG CGCCGATTTT CGTCCCTATG CTGATGTTGG 

1301 CCGGCTACGC GCCCGAAGTC ATTCAAGCCG CTTACCGCAT CGGTGATTCC 

1351 GTTACCAATA TTATTACGCC GATGATGAGT TATTTCGGGC TGATTATGGC 

1401 GACGGTAATC AAATACAAAA AAGATGCGGG CGTAGGCACG CTGATTTCTA 

1451 TGATGTTGCC GTATTCCGCT TTCTTCTTAA TTGCATGGAT CGCCTTATTC 

1501 TGCATTTGGG TATTTGTTTT GGGTCTGCCC GTCGGTCCCG GCACACCCAC 

1551 ATTCTATCCG GTGCCTTAA 

This encodes a protein having amino acid sequence <SEQ ID 140>: 

1 MSQTDARRSG RFLRTVEWLG NMLPHPVTLF IIFIVLLLIA SAVGAYFGLS 

51 VPDPRPVGAK GRADDG LIHV VSLLDADGLI KIL THTVKNF T GFAPLGTVL 

101 VSLLGVGIAE KSGLISALMR LLLTKSPRKL TTFMWFTGI LSNTASELGY 

151 WLIPLSAVI FHSL GRHPLA GLAAAFAGVS GGYSANLFLG TIDPLLAGIT 

201 QQAAQIIHPD YWGPEANWF FMAASTFVIA LIGYFVT EKI VEPQLGPYQS 

251 DLSQEEKDIR HSNEITPLEY KGLIW AGWF VALSALLAWS IV PADGILRH 

301 PETGLVAGSP FLKS IWFIF LLFALPGIVY G RITRSLRGE REWNAMAES 

351 MST LGLYLVI IFFAAQFVAF FNWTNIGQYI AVKGAVFLKK FRLGGSVLFI 

401 GFILICAFIN LMIG SASAQW AVTAPIFVPM LMLAGNAPQV IQAAYRIGDS 

451 VTN IITPMMS YFGLIMATVI KYKKDAGVGT LISMMLPYSA FFLIAWIALF 

501 CIWVFVL GLP VGPGTPTFYP VP* 

ORF12ng shows 97.1% identity in 522 aa overlap with ORF12-1 : 

10 20 30 40 50 60 

orf 12-1. pep MSQTDTQRDGRFLRTVEWLGNMLPHPVTLFI I FIVLLLIASAVGAYFGLS VPDPRPVGAK 
I II I I:: 1:1 I I II I I I I III I Ml III I II Ml I I! II I ! M I M II I II I I Ml ! I I I 
orfl2ng MSQTDARRSGRFLRTVEWLGNMLPHPVTLFI I FIVLLLIASAVGAYFGLS VPDPRPVGAK 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 12-1 . pep GRADDGLIYIVSLLNADGFIKILTHTVKNFTGFAPLGTVLVSLLGVGIAEKSGLISALMR 
I I I I I 1 1 I : : I I I I : I I I : I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I 
orfl2ng GRADDGLIHWSLLDADGLIKILTHTVKNFTGFAPLGTVLVSLLGVGIAEKSGLISALMR 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 12-1 . pep LLLTKSPRKLTTFMWFTGILSNTASELGYWLIPLSAIIFHSLGRHPLAGLAAAFAGVS 
I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I II I I I I : I I I I I I I I I I I II | It | | | | | 
orfl2ng LLLTKSPRKLTTFMWFTGILSNTASELGYWLIPLSAVIFHSLGRHPLAGLAAAFAGVS 
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130 



140 



150 



160 



170 



180 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



65 



70 



190 200 210 220 230 240 

orf 12-1. pep GGYSANLFLGTIDPLLAGITQQAAQIIHPDYWGPEANWFFMVASTFVIALIGYFVTEKI 
llll I I I I I I I II II II I II Mlllll I I M II I I I I II I M:|| I M 1 I II I I I Mill 
orfl2ng GGYSANLFLGTI DPLLAGITQQAAQI IHPDYWGPEANWFFMAASTFVIALIGYFVTEKI 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 12-1 . pep VEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIVPADGILRH 
I I Ml III M MM I I 111 I I Mil I IN I I Mill Ml II I III I II I III Mil I II I 
orfl2ng VEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIVPADGILRH 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 12-1 . pep PETGLVSGSPFLKSIWFIFLLFALPGIVYGRVTRSLRGEQEWNAMAESMSTLGLYLVI 

I I I I M : I I I M I M I 1 1 I I I I M I I M I i M : I I I II I I : t I I I I I I I I I I I | I I | I I I 
orfl2ng PETGLVAGSPFLKSIWFIFLLFALPGIVYGRITRSLRGEREVVNAMAESMSTLGLYLVI 

310 320 330 340 350 360 

370 380 390 400 410 420 

or f 12-1 . pep IFFAAQFVAFFNWTNIGQYIAVKGATFLKEVGLGGSVLFIGFILICAFINU4IGSASAQW 
II M 1 1 1 M 1 1 1 II I II M II II I h M II I II I I II II 1 1 1 II I I II II M 1 1 II I I II 
orfl2ng IFFAAQFVAFFNWTNIGQYIAVKGAVFLKEVGLGGSVLFIGFILICAFINLMIGSASAQW 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf 12-1. pep AVTAPI FVPMLMLAG YAPEVIQAAYR I GDSVTN I IT PMMS YFGLIMATVIKYKKDAGVGT 

II 1MMI II MM) I MM li MUM I II II II! Ml II II II Ml MM Mill III 
orf!2ng AVTAPI FVPMLMLAGYAPEVIQAAYRIGDSVTNI ITPMMS YFGLIMATVIKYKKDAGVGT 

430 440 450 460 470 480 

490 500 510 520 

or f 12-1 . pep LISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAPX 
1 1 I 1 1 1 1 1 1 1 1 1 I 1 1 K 1 1 I M I 1 1 1 I I 1 1 I M I I : I I 1 1 1 : 1 1 
orfl2ng LISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGTPTFYPVPX 

490 500 510 520 

In addition, ORF12ng shows significant homology with a hypotehtical protein from E.coli: 

sp|P46133|YDAH_ECOLI HYPOTHETICAL 55.1 KD PROTEIN IN OGT-DBPA INTERGENIC REGION 
>gi 1 1787597 (AE000231) hypothetical protein in ogt 5'region [Escherichia colij 
Length =510 
Score « 329 bits (835), Expect = 2e-89 

Identities « 178/507 (35%), Positives = 281/507 (55%), Gaps = 15/507 (2%) 

RSGRFLRTVEWLGNMLPHPVTXXXXXXXXXXXASAVGAYFGLSVPDPRPVGAKGRADDGL 67 
+SG+ VE +GN +PHP +A+ + FG+S +P D 
QSGKLYGWVERIGNICVPHPFLLFIYLIIVLMVTTAILSAFGVSAKNP TDGTP 64 

IHWSLLDADGLIKILTHTVKNFTGFAPXXXXXXXXXXXXIAEKSGLISALMRLLLTKSP 127 
+ V +LL +GL L + +KNF+GFAP +AE+ GL+ ALM + + 

VVVKNLLSVEGLHWFLPNVIKNFSGFAPLGAILALVLGAGIJtfSRV^ 124 

RKLTTFMWFTGILSNTASELGYWLIPLSAVIFHSLGRHPLAGLAAAFAGVSGGYSANL 187 
+ ++MV+F S+ +S+ V++ P+ A+IF ++GRHP+AGL AA AGV G++ANL 



Query: 


8 


Sbjct: 


13 


Query: 


68 


Sbjct: 


65 


Query: 


128 


Sbjct: 


125 


Query: 


188 


Sbjct: 


185 


Query: 


248 


Sbjct: 


245 


Query: 


308 


Sbjct: 


299 


Query: 


368 



+ T D LL+GI+ +AA +P 



+Q + ++ + + S 



NW+FMA+S V+ ++G +T+KI+EP+LG 



GL AGW + A +A ++P +GILR P V 
-GLRI AGWS LLFI AAIALMVI PQNG I LRDPINHT VM 298 



SPF+K IV I L F + + YG TR++R + ++ + M E M + •»-+ 



NW+N+G++IAV 



L+ GL G F+G L+ +F+ + I S SA W++ APIF 
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Sbjct: 359 VAMFNWSNMGKFIAVGLTDILESSGLSGIPAFVGLALLSSFLCMFIASGSAIWSILAPIF 418 

Query: 428 VPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKYKKDAGVGTLISMMLP 487 

VPM ML G+ P Q +RI DS + P+ + L + + +YK DA +GT S++LP 
Sbjct: 419 VPMFMLLGFHPAFAQILFRIADSSVLPLAPVSPFVPLFLGFLQRYKPDAKLGTYYSLVLP 478 

Query: 488 YSAFFLIAWIALFCIWVFVLGLPVGPG 514 

Y FL+ W+ + W +++GLP+GPG 
Sbjct: 479 YPLIFLWWLLMLLAW-YLVGLPIGPG 504 

Based on this analysis, including the presence of several putative transmembrane domains and the 
predicted actinin-type actin-binding domain signature (shown in bold) in the gonococcal protein, 
it is predicted that the proteins from N, meningitidis and N.gonorrhoeae, and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 17 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 141>: 

1 . .ACAGCCGGCG CAGCAGGTTn CnCGGTCTTC GTTTTCGTAA CGGACAGTCA . 

51 GGTGGAGGTG TTCGGGAACA TCCAGACCGC AGTGGAAACA GGTTTTTTTC 

101 ATGGCATTTC GGTTTCGTCT GTGTTTGGTG CGGCGGCACA AGACTCGGCA 

151 ATgGCTTCGC GCAGTGCGTC TATACCGGTA TTTTCAGCAA CGGAAATGCG 

201 GACGGcGgCA ATTTTTCCCG CAGCGTCGCG CCATATGCCC GTGTTTTgTT 

251 CTTCAGACGG CAGCAGGTCG GTTTTGTTGT ACACCTTgAT GCACGGAaTA j 

301 TCGCCGGCAT GGATTTCTTG CAGTACGTTT TCCACGTCTT CAATCTGCTG 

351 TCCGCTGTTC GGAGCGGCGG CATCGACGAC GTGCAGCAGC ACATCgGcTT 

401 gCGCGGTTTC TTCCAGCGTG GCgGAAAAGG CGGAAATCAG TTTgTGCGGC 

451 agATyGCTnA CGAATCCGAC GGTATCGGTC AGGATAATGC TGCATTCGGG 

501 ACT.. 

This corresponds to the amino acid sequence <SEQ ID 142; ORF14>: 

1 . . TAGAAGXXVF VFVTDSQVEV FGNIQTAVET GFFHGISVSS VFGAAAQDSA 

51 MASRSASIPV FSATEMRTAA IFPAASRHMP VFCSSDGSRS VLLYTLMHGI 

101 SPAWISCSTF STSSICCPLF GAAASTTCSS TSACAVSSSV AEKAEISLCG 

151 RXLTNPTVSV RIMLHSG. . 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from Kmeninzitidis (strain A) 

ORF14 shows 94.0% identity over a 167aa overlap with an ORF (ORF14a) from strain A of N. 
meningitidis: 

10 20 30 

or f 14 . pep TAGAAGXXVFVFVT DSQVEVFGN I QTAVET 

1 = 1111 IIHIII:|::llll:l till 
orfl4a GRQLGFLRVGGALFVITAQARVNNALCDCLTTGAAGFAVFVFVTDGQMQVFGNVQPAVET 
150 160 170 180 190 200 

40 50 60 70 80 90 

orfl4.pep GFFHGISVSSVFGAAAQDSAMASRSASIPVFSATEMRTAAIFPAASRHMPVFCSSDGSRS 
tMllillllillllll I! I I I I II I I I I I I I I i II I I I I 1 I I I I I I I M I ! I I I I I I i 
or f 1 4 a G FFHGI S VS S VFGAAAQYSAMASRSAS I PVFS ATEMRTAAI FPAASRHMPVFCSS DGSRS 

210 220 230 240 250 260 

100 110 120 130 140 150 

orfH.pep VLLYTLMHGISPAWISCSTFSTSSICCPLFGAAASTTCSSTSACAVSSSVAEKAEISLCG 
II i II I 1 1 1 III! I II I II I II I 1 1 1 II II I II III I II III Ml MM I 1 1 111 I II I I 
orfl4a VLLYTLMHGISPAWISCSTFSTSSICCPLFGAAASTTCSSTSACAVSSSVAEKAEISLCG 
270 280 290 300 310 320 
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160 

orf 14 . pep RXLTNPTVSVRIMLHSG 
I I I I I I I I I I I I I I I I 

orf 14a RSLTNPTVSVRIMLHSGLMYSRRAWSSVAKSWSFAYMPDLVSRLNRLDLPTLVX 
330 340 350 360 370 380 

The complete length ORF14a nucleotide sequence <SEQ ID 143> is: 

1 ATGGAGGATT TGCAGGAAAT CGGGTTCGAT GTCGCCGCCG TAAAGGTAGG 

,■ 51 TCGGCAGCGC GAACATCATC GTCTGCATCA TCCCCAGCCC GGCAACGGCG 

101 AGGCGGACGA TGTATTGTTT GCGTTCTTTT TGGTTGGCGG CTTCGATTTT 

151 TTGCGCGTCA TAGGGTGCGG CGGTGTAGCC TATCTGCCTG ATTTTCAACA 

201 GAATGTCGGA AAGGCGGATT TTGCCGTCGT CCCAGACGAC GCGGCAGCGG 

251 TGCGTGCTGT AATTGAGGTC GATGCGGACG ATGCCGTCTG TACGCAAAAG 

301 CTGCTGTTCG ATCAGCCAGA CGCAGGCGGC GCAGGTGATG CCGCCGAGCA 

351 TTAAAACCGC CTCGCGCGTG CCGCCGTGGG TTTCCACAAA GTCGGACTGG 

401 ACTTCGGGCA GGTCGTACAG GCGGATTTGG TCGAGGATTT CTTGGGGCGG 

451 CAGCTCGGTT TTTTGCGCGT CGGCGGTGCG TTGTTTGTAA TAACTGCCCA 

501 AGCCCGCGTC AATAATGCTT TGTGCGACTG CCTGACAACC GGCGCAGCAG 

551 GTTTCGCGGT CTTCGTTTTC GTAACGGACG GTCAGATGCA GGTTTTCGGG 

601 AACGTCCAGC CCGCAGTGGA AACAGGTTTT TTTCATGGCA TTTCGGTTTC 

651 GTCTGTGTTT GGTGCGGCGG CACAATACTC GGCAATGGCT TCGCGCAGTG 

701 CGTCTATACC GGTATTTTCA GCAACGGAAA TGCGGACGGC GGCAATTTTT 

751 CCCGCAGCGT CGCGCCATAT GCCCGTGTTT TGTTCTTCAG ACGGCAGCAG 

801 GTCGGTTTTG TTGTACACCT TGATGCACGG AATATCGCCG GCATGGATTT 

851 CTTGCAGTAC GTTTTCCACG TCTTCAATCT GCTGTCCGCT GTTCGGAGCG 

901 GCGGCATCGA CGACGTGCAG CAGCACATCG GCTTGCGCGG TTTCTTCCAG 

951 CGTGGCGGAA AAGGCGGAAA TCAGTTTGTG CGGCAGATCG CTGACGAATC 

1001 CGACGGTATC GGTCAGGATA ATGCTGCATT CGGGACTGAT GTACAGCCGC 

1051 CGCGCCGTCG TGTCGAGTGT GGCGAAAAGC TGGTCTTTCG CATATATGCC 

1101 CGACTTGGTC AGCCGGTTGA ACAGACTGGA TTTGCCGACA TTGGTATAG 

This encodes a protein having amino acid sequence <SEQ ID 144>: 

1 MEDLQEIGFD VAAVKVGRQR EHHRLHHPQP GNGEADDVLF AFFLVGGFDF 

51 LRVIGCGGVA YLPDFQQNVG KADFAWPDD AAAVRAVIEV DADDAVCTQK 

101 LLFDQPDAGG AGDAAEH*NR LARAAVGFHK VGLDFGQWQ ADLVEDFLGR 

151 QLGFLRVGGA LFVITAQARV NNALCDCLTT GAAGFAVFVF VTDGQMQVFG 

201 NVQPAVETGF FHGISVSSVF GAAAQYSAMA SRSASIPVFS ATEMRTAAIF 

251 PAASRHMPVF CSSDGSRSVL LYTLMHGISP AWISCSTFST SSICCPLFGA 

301 AASTTCSSTS ACAVSSSVAE KAEISLCGRS LTNPTVSVRI MLHSGLMYSR 

351 RAWSSVAKS WSFAYMPDLV SRLNRLDLPT LV* . 

It should be noted that this sequence includes a stop codon at position 118. 
Homology with a predicted ORF from Kzonorrhoeae 

ORF14 shows 89.8% identity over a 167aa overlap with a predicted ORF (ORF14.ng) from N. 
gonorrhoeae: 

orf 14. pep TAGAAGXXVFVFVTDSQVEVFGNIQTAVET 30 

It III I I : I I : I : I : : I | I I : I MM 
o r f 1 4 ng GRQFG FFRVGGAS FVITAQAGI DDALCDCLTADAAGFAVFAFVADGQMQVFGNVQPAVET 208 

orf 14 .pep GFFHGISVSSVFGAAAQDSAMASRSASIPVFSATEMRTAAIFPAASRHMPVFCSSDGSRS 90 

I INI I Ml MM Ml I Mill III III I II II III MM III MM I MM III MM 
orfl4ng GFFHGISVSSVFGAAAQYSAMASRSASIPVFSATEMRTAAIFPAASRHMPVFCSSDGSRS 268 

orf 14 . pep VLLYTLMHGISPAWISCSTFSTSSICCPLFGAAASTTCSSTSACAVSSSVAEKAEISLCG 150 

MMMIIMI lit I II I I I II ! I Mill II I II I II i II II : I II : II M M II I M 
orfl4ng VLLYTLMHG I SWAW I SCST FSTSS ICCPLFRAAASTTCSSTSACTVSSKVAEKAE I SLCG 328 

orf 14 . pep RXLTNPTVSVRIMLHSG 1 67 

I I II II I II II II I : I 

orfl4ng RSLTNPTVSVRIMLHAGLMYSRRAWSRVAKSWSFAYMPDLVSRLNRLDLPTLV 382 

The complete length ORF14ng nucleotide sequence <SEQ ID 145> is predicted to encode a protein 
having amino acid sequence <SEQ ID 146>: 
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1 MEDLQEIGFD VAAVKVGRQR EHHRLHHTQS GNGKADD VLF AFFLVGGFDF 

51 LRVIGCGGVA CLPDFQQNVG EADFAWPDD AAAVRAVIEV DADDAVCAQK 

101 LLFDQPDAGG AGNAAEHQHC FVRAIMGFHK VGLDFGQWQ ADLVEDFLGR 

151 QFGFFRVGGA SFVITAQAGI DDALCDCLTA DAAGFAVFAF VADGQMQVFG 

201 NVQPAVETGF FHGISVSSVF GAAAQYSAMA SRSASIPVFS ATEMRTAAIF 

251 PAASRHMPVF CSSDGSRSVL LYTLMHGISW AWISCSTFST SSICCPLFRA 

301 AASTTCSSTS ACTVSSKVAE KAEISLCGRS LTNPTVSVRI MLHAGLMYSR 

351 RAWSRVAKS WSFAYMPDLV SRLNRLDLPT LV* 

Based on the putative transmembrane domain in the gonococcal protein, it is predicted that the 
proteins from Kmeningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for 
vaccines or diagnostics, or for raising antibodies. 



Example 18 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 147>: 

1 . .GGCCATTACT CCGACCGCAC TTGGAAGCCG CGTTTGGNCG GCCGCCGTCT 

51 GCCGTATCTG CTTTATGGCA CGCTGATTGC GGTTATTGTG ATGATTTTGA 

101 TGCCGAACTC GGGCAGCTTC GGTTTCGGCT ATGCGTCGCT GGCGGCTTTG 

151 TCGTTCGGCG CGCTGATGAT TGCGCTGTTA GACGTGTCGT CAAATATGGC 

201 GATGCAGCCG TTTAAGATGA TGGTCGGCGA CATGGTCAAC GAGGAGCAGA , 

251 AAA.NTACGC CTACGGGATT CAAAGTTTCT TAGCAAATAC GGGCGCGGTC 

301 GTGGCGGCGA TTCTGCCGTT TGTGTTTGCG TATATCGGTT TGGCGAACAC 

351 CGCCGANAAA GGCGTTGTGC CGCAGACCGT GGTCGTGGCG TTTTATGTGG 

401 GTGCGGCGTT GCTGGTGATT ACCAGCGCGT TCACGATTTT CAAAGTGAAG 

451 GAATACGANC CGGAAACCTA CGCCCGTTAC CACGGCATCG ATGTCGCCGC 

501 GAATCAGGAA AAAGCCAACT GGATCGCACT CTTAAAA.CC GCGC.. 

This corresponds to the amino acid sequence <SEQ ID 148; ORF16>: 



1 ..GHYSDRTWKP RLXGRRLPYL LYGTLIAVIV MILMPNSGSF GFGYASLAAL 

51 SFGALMIALL DVSSNMAMQP FKMMVGDMVN EEQKXYAYGI QSFLANTGAV 

101 VAAILPFVFA YIGLANTAXK GWPQTWVA FYVGAALLVI TSAFTIFKVK 

151 EYXPETYARY HGIDVAANQE KANWIALLKX A. . 

Further work revealed the complete nucleotide sequence <SEQ ID 149>: 



1 ATGTCGGAAT ATACGCCTCA AACAGCAAAA CAAGGTTTGC CCGCGCTGGC 

51 AAAAAGCACG ATTTGGATGC TCAGTTTCGG CTTTCTCGGC GTTCAGACGG 

101 CCTTTACCCT GCAAAGCTCG CAAATGAGCC GCATTTTTCA AACGCTAGGC 

151 GCAGACCCGC ACAATTTGGG CTGGTTTTTC ATCCTGCCGC CGCTGGCGGG 

201 GATGCTGGTG CAGCCGATTG TCGGCCATTA CTCCGACCGC ACTTGGAAGC 

251 CGCGTTTGGG CGGCCGCCGT CTGCCGTATC TGCTTTATGG CACGCTGATT 

301 GCGGTTATTG TGATGATTTT GATGCCGAAC TCGGGCAGCT TCGGTTTCGG 

351 CTATGCGTCG CTGGCGGCTT TGTCGTTCGG CGCGCTGATG ATTGCGCTGT 

401 TAGACGTGTC GTCAAATATG GCGATGCAGC CGTTTAAGAT GATGGTCGGC 

451 GACATGGTCA ACGAGGAGCA GAAAGGCTAC GCCTACGGGA TTCAAAGTTT 

501 CTTAGCAAAT ACGGGCGCGG TCGTGGCGGC GATTCTGCCG TTTGTGTTTG 

551 CGTATATCGG TTTGGCGAAC ACCGCCGAGA AAGGCGTTGT GCCGCAGACC 

601 GTGGTCGTGG CGTTTTATGT GGGTGCGGCG TTGCTGGTGA TTACCAGCGC 

651 GTTCACGATT TTCAAAGTGA AGGAATACGA TCCGGAAACC TACGCCCGTT 

701 ACCACGGCAT CGATGTCGCC GCGAATCAGG AAAAAGCCAA CTGGATCGAA 

751 CTCTTGAAAA CCGCGCCTAA GGCGTTTTGG ACGGTTACTT TGGTGCAATT 

801 CTTCTGCTGG TTCGCCTTCC AATATATGTG GACTTACTCG GCAGGCGCGA 

851 TTGCGGAAAA CGTCTGGCAC ACCACCGATG CGTCTTCCGT AGGTTATCAG 

901 GAGGCGGGTA ACTGGTACGG CGTTTTGGCG GCGGTGCAGT CGGTTGCGGC 

951 GGTGATTTGT TCGTTTGTAT TGGCGAAAGT GCCGAATAAA TACCATAAGG 

1001 CGGGTTATTT CGGCTGTTTG GCTTTGGGCG CGCTCGGCTT TTTCTCCGTT 

1051 TTCTTCATCG GCAACCAATA CGCGCTGGTG TTGTCTTATA CCTTAATCGG 

1101 CATCGCTTGG GCGGGCATTA TCACTTATCC GCTGACGATT GTGACCAACG 

1151 CCTTGTCGGG CAAGCATATG GGCACTTACT TGGGCTTGTT TAACGGCTCT 

1201 ATCTGTATGC CTCAAATCGT CGCTTCGCTG TTGAGTTTCG TGCTTTTCCC 

1251 TATGCTGGGC GGCTTGCAGG CCACTATGTT CTTGGTAGGG GGCGTCGTCC 

1301 TGCTGCTGGG CGCGTTTTCC GTGTTCCTGA TTAAAGAAAC ACACGGCGGG 

1351 GTTTGA 
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This corresponds to the amino acid sequence <SEQ ID 150; ORF16-l>: 

1 MSEYTPQTAK QGLPALAKST IWMLSFGFLG VQTAFTLQSS QMSRIFQTLG 

51 ADPHNLGW FF ILPPLAGMLV QPIVG HYSDR TWKPRLGGRR LPYLLYGTLI 

101 AVIVMILM PN SGSFGFGY AS LAALSFGALM IALLDV SSNM AMQPFKMMVG 

151 DMVNEEQKGY AYGIQSFLAN T GAWAAILP FVFAYIGLAN TAEKGWPQT 

201 VWAFYVGAA LLVITSA FTI FKVKEYDPET YARYHGIDVA ANQEKANWIE 

251 LLKTAPKAFW TVTLVQFFCW FAFQYMWTYS AGAIAENVWH TT DAS SVG YQ 

301 EAGNWY GVLA AVQSVAAVIC SFVL AKVPNK YHKAG YFGCL ALGALGFFSV 

351 FFIGNQ YALV LSYTLIGIAW AGII TYPLTI VTNALSGKHM GTYLGLFNGS 

401 ICMP QIVASL LSFVLFPMLG GLQ ATM FLVG GWLLLGAF S VFLIKETHGG 

451 v* : 

Computer analysis of this amino acid sequence gave the following results: 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF 16 shows 96.7% identity over a 181aa overlap with an ORF (ORF 16a) from strain A ofN. 
meningitidis: 

10 20 30 

orf 1 6 . pep GHYSDRTWKPRLXGR RLPYLLYGTLIAVIV 
T _ m Mill I II III I I IN II MM Ml IN I 

orf 16a IFQTLGADPHSLGW FFILPPLAGMLVQPIVG HYSDRTWKPRLGGR RLPYLLYGTLIAVIV 
50 60 70 80 90 100 

40 50 60 70 80 90 

or f 1 6 . pep ^LMPNSGSFGFGY ASLAALSFGT^LMIALLDV SSNMAMOPFKMMVGDMVNEEQKXYAYGT 

1 1 1 m 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 m 1 1 1 1 1 r 1 1 1 1 1 1 1 1 1 1 1 1 1 r 1 1 1 1 1 1 1 1 1 mil 

orf 1 6a MILMPNSGSFGFGY ASLAALSFGALMIALLDV SSNMAMQPFKMMVG 

110 120 130 140 150 160 

100 110 120 130 140 150 

orf 16 . pep QSFLANTG AWAAILPFVFAYIGLAN TAXKGVVPQT VVVAFYVGAALLVITSA FTIFKVK 
M M II I II II I M II II II II II II I I II II II I I I II I I II M I II II M I I II I I I 
orfl6a QS FLANTG AWAAI L PFV FAYIGLA NTAEKG WPQT WVAFYVGAALLVITSA FT I FKVK 

170 180 190 200~ 2l0 220 

160 170 180 

or f 1 6 . pep EYXPETYARYHGI DVAANQEKANWI ALLKXA 
M < I M I I I i r I S I J I I 1 M 1 1 M 111:1 
orf 16a EYNPETYARYHGIDVAANQEKANWIELLKTAPKAFWTVTLVQFFCWFAFQYMWTYSAGAI 
230 240 250 260 270 280 

or f 1 6a AENVWHTTDAS SVGYQEAGNWYG VLAAVQS VAAV I CS FVL AKVPNKYHKAG YFGCT ATiftA 

290 300 310 320 330 340 

The complete length ORF16a nucleotide sequence <SEQ ID 151> is: 

1 ATGTCGGAAT ATACGCCTCA AACAGCAAAA CAAGGTTTGC CCGCGCTGGC 

51 AAAAAGCACG ATTTGGATGC TCAGTTTCGG CTTTCTCGGC GTTCAGACGG 

101 CCTTTACCCT GCAAAGCTCG CAGATGAGCC GCATCTTCCA GACGCTCGGT 

151 GCCGATCCGC ACAGCCTCGG CTGGTTCTTT ATCCTGCCGC CGCTGGCGGG 

201 GATGCTGGTG CAGCCGATTG TCGGCCATTA CTCCGACCGC ACTTGGAAGC 

251 CGCGTTTGGG CGGCCGCCGT CTGCCGTATC TGCTTTATGG CACGCTGATT 

301 GCGGTTATTG TGATGATTTT GATGCCGAAC TCGGGCAGCT TCGGTTTCGG 

351 CTATGCGTCG CTGGCGGCTT TGTCGTTCGG CGCGCTGATG ATTGCGCTGT 

401 TAGACGTGTC GTCAAATATG GCGATGCAGC CGTTTAAGAT GATGGTCGGC 

451 GACATGGTCA ACGAGGAGCA GAAAGGCTAC GCCTACGGGA TTCAAAGTTT 

501 CTTAGCGAAT ACGGGCGCGG TCGTGGCGGC GATTCTGCCG TTTGTGTTTG 

551 CGTATATCGG TTTGGCGAAC ACCGCCGAGA AAGGCGTTGT GCCGCAGACC 

601 GTGGTCGTGG CGTTTTATGT GGGTGCGGCG TTGCTGGTGA TTACCAGCGC 

651 GTTCACGATT TTCAAAGTGA AGGAATACAA TCCGGAAACC TACGCCCGTT 

701 ACCACGGCAT CGATGTCGCC GCGAATCAGG AAAAAGCCAA CTGGATCGAA 

751 CTCTTGAAAA CCGCGCCTAA GGCGTTTTGG ACGGTTACTT TGGTGCAATT 

801 CTTCTGCTGG TTCGCCTTCC AATATATGTG GACTTACTCG GCAGGCGCGA 

851 TTGCGGAAAA CGTCTGGCAC ACCACCGATG CGTCTTCCGT AGGTTATCAG 

901 GAGGCGGGTA ACTGGTACGG CGTTTTGGCG GCGGTGCAGT CGGTTGCGGC 

951 GGTGATTTGT TCGTTTGTAT TGGCGAAAGT GCCGAATAAA TACCATAAGG 
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1001 CGGGTTATTT CGGCTGTTTG GCTTTGGGCG CGCTCGGCTT TTTCTCCGTT 

1051 TTCTTCATCG GCAACCAATA CGCGCTGGTG TTGTCTTATA CCTTAATCGG 

1101 CATCGCTTGG GCGGGCATTA TCACTTATCC GCTGACGATT GTGACCAACG 

1151 CCTTGTCGGG CAAGCATATG GGCACTTACT TGGGCCTGTT TAACGGCTCT 

1201 ATCTGTATGC CGCAAATCGT CGCTTCGCTG TTGAGTTTCG TGCTTTTCCC 

1251 TATGCTGGGC GGCTTGCAGG CCACTATGTT CTTGGTAGGG GGCGTCGTCC 

1301 TGCTGCTGGG CGCGTTTTCC GTGTTCCTGA TTAAAGAAAC ACACGGCGGG 

1351 GTTTGA 

This encodes a protein having amino acid sequence <SEQ ID 152>: 

1 MSEYTPQTAK QGLPALAKST IWMLSFGFLG VQTAFTLQSS QMSRIFQTLG 

51 ADPHSLGW FF ILPPLAGMLV QPIVGH YSDR TWKPRLGGRR LPYLLYGTLI 

101 AV IVMILM PN SGSFGFGY AS LAALSFGALM IALLDV SSNM AMQPFKMMVG 

151 DMVNEEQKGY AYGIQSFLAN TG AWAAILP FVFAYIGLAN TAEKGWPQT 

201 W VAFYVGAA LLVITSA FTI FKVKEYNPET YARYHGIDVA ANQEKANWIE 

251 LLKTAPKAFW TVTLVQFFCW FAFQYMWTYS AGAIAENVWH TTDASSVGYQ 

301 KAfiNWYG VLA AVOSVAAVIC SFVLA KVPNK YHKAGY FGCL ALGALGFFSV 

351 FFIGNOY ALV LSYTLIGIAW AGII TYPLTI VTNALSGKHM GTYLGLFNGS 

401 THMPO IVASL LSFVLFPMLG GL QATM FLVG GWLLLGAFS VFLI KETHGG 

451 V* 

ORFl6a and ORF16-1 show 99.6% identity in 451 aa overlap: 

10 20 30 40 50 60 

orfl6a pep MSEYTPQTAKQGLPALAKSTIWMLSFGFLGVQTAFTLQSSC^ISRIFQTLGADPHSLGWFF 

* P * | | | | | | | | | | | | I I II I II I I i I I II I I I I II II I I I i 1 I I I I I I I I I I I I II I : I I I I • 

nrfl6-l MSEYTPQTAKQGLPALAKSTIV?MLSFGFIjGVQTAFTLQSS®1SRIFQTLGADPHNLGWFF 

10 20 30 40 50 60 

70 80 90 100 110 120 

orfl6a Pep ilpplagmlvqpivghysdrtwkprlggrrlpyllygtliavivmilmpnsgsfgfgyas 
I U III IIMIM I I II Ml Mil! MINI I MINI! II II I I IN II III IIIUM 
orfl6-l ilpplagmlvqpivghysdrtwkprlggrrlpyllygtliavivmilmpnsgsfgfgyas 

70 80 90 100 110 120 

130 140 150 160 170 180 

orfl6a peo IAALSFGALMIALLDVSSNMAMQPFKMMVGDMVNEEQKGYAYGIQSFLANTGAVVAAILP 
M 1 1 I I I M I 1 1 M I It ) I M II M M 1 M It t I I M t I I I M M I M I I I II M I I I I t 
orfl6-l LAALSFGALMIALLDVSSNMAMQPFKMMVGDMVNEEQKGYAYGIQSFLANTGAWAAILP 

130 140 150 160 170 180 

190 200 210 220 230 240 

or fl 6a pep FVFAYIGLANTAEKGWPQTVVVAFYVGAALLVITSAFTIFKVKEYNPETYARYHGI DVA 

| I || Mil II Ml IMM MM III M Mill I MM III MM I Ml I II I MM I II I 
orfl6-l FVFAYIGLANTAEKGWPQTVWAFYVGAALLVITSAFTIFKVKEYDPETYARYHGIDVA 

190 200 210 220 230 240 

250 260 270 280 290 300 

or f 1 6a . pep ANQEKANW I ELLKTAPKAFWTVTLVQFFCWFAFQYMWT YSAGAI AEN VWHTT DASS VGYQ 

1 K ! I 1 1 1 1 I M II 1 1 I I M 1 1 1 k I 1 1 I I M I I K I I I M M M II M I M 1 1 I II I 

orfl6-l ANQEKANWIELLKTAPKAFWTVTLVQFFCWFAFQYMWTYSAGAIAENVWHTT DAS SVGYQ 

250 260 270 280 290 300 

310 320 330 340 350 360 

orfl6a pep EAGNWYGVIJ^VQSVAAVICSFVLAKVPNKYHKAGYFGCLALGALGFFSVFFIGNQYALV 
I I 1 1 1 1 1 t 1 1 I I 1 1 I I M I II I 1 1 1 M 1 11 I I 1 1 1 1 1 I ! 1 1 M 1 I IE I I 1 1 I 1 1 M I It I 
or f 1 6-1 EAGNWYGVLAAVQSVAAVI CS FVLAKVPNKYHKAGYFGCLALGALGFFS VFFIGNQYALV 

310 320 330 340 350 360 

370 380 390 400 410 420 

orfl6a.pep LSYTLIGIAWAGIITYPLTIVTNALSGKHMGTYLGLFNGSICMPQIVASLLSFVLFPMLG 

IMIM I I I M 1 1 I M II II I II II I I M I II I II II II 1 M II M II 1 1 II M 

orfl6-l LSYTLIGIAWAGIITYPLTIVTNALSGKHMGTYLGLFNGSICMPQIVASLLSFVLFPMLG 

370 380 390 400 410 420 

430 440 450 

or f 16a pep GLQATMFLVGGWLLLGAFSVFLIKETHGGVX 
1 1 1| M 1 1 M I II M I I II I I M I M M I I M 
orfl6-l GLQATMFLVGG WLLLGAFSVFL I KETHGGVX 

430 440 450 
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Homology with a predicted ORF from A Gonorrhoeae 

ORF16 shows 93.9% identity over a 181aa overlap with a predicted ORF (ORF16.ng) from K 
gonorrhoeae: 



orfl6.pep 
orf 1 6ng 
orf 16. pep 
orf 16ng 
orf 16. pep 
orfl6ng 
orf 16. pep 
orf 16ng 



GH Y S DRT WKPRLXGRRL P YLLYGTL I AV IV 30 
1:1111111111 I I I I 1 I i I I I I I I I I I I 

HFSNARRRPAQFGLVFHPAAAGGDAGSADSGYYSDRTWKPRLGGRRLPYLLYGTLIAVIV 131 

MIIMPNSGSEXSFGYASLAALSFGALMIALLDVSSNMAMQ 90 
M II I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I | I I I | | | | | | | | | | | | | | 

MILMPNSGSFGFGYASLAALSFGALMIALLDVSSNMAMQPFKMMVGDMVNEEQKSYAYGI 1 91 

QS FLANTGAWAAI LPFVFAYIGLANTAXKGWPQTWVAFYVGAALLVITSAFTIFKVK 150 
t I I I I I I llllll II Ml I MM MM I II MM I II III Mill 1:1 III III Ml 

QSFLANTDAWAAILPFVFAYIGLANTAEKGVVPQTVVVAFYVGAALLIITSAFTISKVK 251 



EYXPETYARYHGIDVAANQEKANWIALLKXA 
II I M II I I I I I M I I I I I M II : 111:1 

EYDPET YARYHG I DVAANQEKANWFELLKTAPKVFWTVT PVQFFCW FAFRYMWTYSAGAI 



181 



311 



The complete length ORF16ng nucleotide sequence <SEQ ID 153> is: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 



ATGATAGGGG 
TACTTTTCAA 
CAAACAGCAA 
GTTGAGCTTC 
CGCAGATGAG 
GGCTGGTTTT 
AGTGGCTACT 
CCTGCCGTAT 
TGATGCCGAA 
TTGTCGTTCG 
GGCGATGCAG 
AGAAAAGCTA 
GTTGTGGCAG 
CACTGCCGAG 
TGGGTGCGGC 
AAAGAATACG 
CGCGAATCAG 
AAGTGTTTTG 
CGGTATATGT 
CACTACCGAT 
GCGTTTTGGC 



ATCGCCGCGC 
ATGAAAAAAA 
AACAAGGTTT 
GGCTATCTCG 
CCGCATTTTT 
TCATCCTGCC 
ACTCAGACCG 
CTGCTTTACG 
CTCGGGCAGC 
GCGCGCTGAT 
CCGTTTAAGA 
CGCCTACGGG 
CGATTCTGCC 
AAAGGCGTTG 
GTTACTGATT 
ACCCGGAAAC 
GAAAAAGCCA 
GACGGTTACT 
GGACTTACTC 
GCGTCTTCCG 
GGCGGTGTAG 



CGGCAACCAT 
AGGATTTACT 
GCCCGCGCCG 
GCGTTCAGAC 
CAAACGCTAG 
GCCGCTGGCG 
CACTTGGAAG 
GCACGCTGAT 
TTCGGTTTCG 
GATTGCGCTG 
TGATGGTCGG 
ATTCAAAGTT 
GTTTGTGTTC 
TGCCACAAAC 
ATTACCAGTG 
CTACGCCCGT 
ACTGGTTCGA 
CCGGTACAGT 
GGCAGGCGCG 
TAGGCCATCA 



TTCGGATTTT 
TTATGTCGGA 
GCAAAAAGCA 
GGCCTTTACC 
GCGCAGACCC 
GGGATGCTGG 
CCGCGCTTGG 
TGCGGTCATC 
GCTATGCGTC 
TTGGACGTGT 
CGATATGGTC 
TCTTAGCGAA 
GCGTATATCG 
CGTGGTCGTA 
CGTTCACAAT 
TACCACGGCA 
ACTCTTAAAA 
TTTTCTGCTG 
ATTGCAGAAA 
GGAGGCGGGC 



CCAAAGCAAA 
ATATACGCCT 
CGATTTGGAT 
CTGCAAAGCT 
GCACAATTTG 
TTCAGCCGAT 
GCGGCCGCCG 
GTGATGATTT 
GCTGGCGGCC 
CGTCGAATAT 
AACGAGGAGC 
TACGGACGCG 
GTTTGGCGAA 
GCATTCTATG 
CTCCAAAGTC 
TCGATGTCGC 
ACCGCGCCTA 
GTTCGCCTTC 
ACGTCTGGCA 
AACCGGTACG 



This encodes a protein having amino acid sequence <SEQ ID 154>: 



1 MIGDRRAGNH FGFSKANTFQ 

51 VELRLSRRSD GLYPAKLADE 

101 SGYYSDRTWK PRLGGR RLPY 

151 LSFGALMIAL LDV SSNMAMQ 

201 WAAILPFVF AYIGLA NTAE 

251 KEYDPETYAR YHGIDVAANQ 

301 RYMWTYSAGA IAENVWHTTD 



IKKKDLLYVG IYASNSKTRF 
PHFSNARRRP AQFGLVFHPA 
LLYGTLIAVI VMIL MPNSGS 
PFKMMVGDMV NEEQKSYAYG 
KGWPQT VW AFYVGAALLI 
EKANWFELLK TAPKVFWTVT 
ASSVGHQEAG NRYGVLAAV* 



ARAGKKHDLD 
AAGGDAGSAD 
FGFGYASLAA 
IQSFLANTDA 
ITSAFTISKV 



PVQFFCWFAF 



ORF16ng and ORF16-1 show 89.3% identity in 261 aa overlap: 



30 40 50 60 70 80 

orf 16-1 . pep MLSFGFLGVQTAFTLQSSQMSRIFQTLGADPHNLGWFFILPPLAGMLVQPI-VGHYSDRT 

I :: I I I M : I: It I II 

orfl6ng DVE LRLSRRSDGLY PAKLADEPHFSNARRRPAQFGLVF-HPAAAGGDAGSAD SG YYS DRT 

50 60 70 80 90 100 

90 100 110 120 130 140 

orf 16-1 . pep WKPRLGGRRL P YLLYGTLI AV I VMI LMPN SG S FGFGYASLAALS FGALM I ALLDVS SNMA 
I M I I II MM I I I I I II M II I II I I II I II II I M I I I I I II I I || M M I II II II I 
orfl6ng WKPRLGGRRLPYLLYGTLIAVIVMILMPNSGSFGFGYASLAALSFGALMIALLDVSSNMA 
110 120 130 140 150 160 
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150 160 170 180 190 200 

orf 1 6-1 . pep MQPFKMMVGDtWNEEQKGYAYGIQSFIJ^TGAVVARILPFVFAyiGLT^TMKGyy^Ty 

or f 1 6na mqp^^GDMWEEQKSYAYGIQSFLANTDAWWVILPFVFAYIGLANTAEKGWPQTV 
OE 9 170 180 190 200 210 220 

210 220 230 240 250 260 

orf 1 6-1 . pep WAFWGAALLVITSAFTIFKVKEYDPETYARYHGID\^ 



orf 1 6ng 



orf 16-1. pep 



|||||||||||:||||||| |||||IIIINIIIIIIIIIIIIIIH:MIIIHI:IM 
WAFYVGAALLIITSAFTISKVKEYDPETYARYHGIDVAANQEKANWFELLKTAPKVFWT 

230 240 250 260 270 280 

270 280 290 300 310 320 

VTLVQFFCWFAFQYMWTYSAGAIAENVWHTTDASSVGYQEAGNWYGVLAAVQSVAAVICS 

|| | | | | | | | | | : I I I I I I I I I I I' I I I I I I I I I I I I I : I I I I I I I I I I I I 



r> r f\ fina vTPVQFFCWFAFRYMWTYSAGAIAENVWHTTDASSVGHQEAGNRYGVLAAVX 
° 9 290 300 310 320 330 340 

Based on this analysis, including the presence of several putative transmemhrane domains in the 
gonococcal protein, it is predicted that the proteins from N.meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 19 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 155>: | 

1 ATGTTGTTCC GTAAAACGAC CGCCGCCGTT TTGGCGCATA CCTTGATGCT J 

51 GAACGGCTGT ACGTTGATGT TGTGGGGAAT GAACAACCCG GTCAGCGAAA 

101 CAATCACCCG NAAACACGTT GNCAAAGACC AAATCCGNGN CTTCGGTGTG 

151 GTTGCCGAAG ACAATGCCCA ATTGGAAAAG GGCAGCCTGG TGATGATGGG 

201 CGGAAAATAC TGGTTCGTCG TCAATCCCGA AGATTCGGCG AA.NTGACGG 

251 GNATTTTGAN GGCAGGGCTG GACAAACCCT TCCAAATAGT TNAGGATACC 

301 CCGAGCTATG C.TGCCACCA AGCCCTGCCG GTCAAACTCG GATCGNCTGG 

351 CAGCCAGAAT. . . 

This corresponds to the amino acid sequence <SEQ ID 156; ORF28>: 

1 MLFRKTTAAV LAHTLMLNGC TLMLWGMNNP VSETITRKHV XKDQIRXFGV 
51 VAEDNAQLEK GSLVMMGGKY WFWNPEDSA XXTGILXAGL DKPFQIVXDT 
101 PSYXCHQALP VKLGSXGSQN . . . 

Further work revealed the complete nucleotide sequence <SEQ ID 157>: 

1 ATGTTGTTCC GTAAAACGAC CGCCGCCGTT TTGGCGGCAA CCTTGATGCT 

51 GAACGGCTGT ACGTTGATGT TGTGGGGAAT GAACAACCCG GTCAGCGAAA 

101 CAATCACCCG CAAACACGTT GACAAAGACC AAATCCGCGC CTTCGGTGTG 

151 GTTGCCGAAG ACAATGCCCA ATTGGAAAAG GGCAGCCTGG TGATGATGGG 

201 CGGAAAATAC TGGTTCGTCG TCAATCCCGA AGATTCGGCG AAGCTGACGG 

251 GCATTTTGAA GGCAGGGCTG GACAAACCCT TCCAAATAGT TGAGGATACC 

301 CCGAGCTATG CTCGCCACCA AGCCCTGCCG GTCAAACTCG AATCGCCTGG 

351 CAGCCAGAAT TTCAGTACCG AAGGCCTTTG CCTGCGCTAC GATACCGACA 

401 AGCCTGCCGA CATCGCCAAG CTGAAACAGC TCGGGTTTGA AGCGGTCAAA 

451 CTCGACAATC GGACCATTTA CACGCGCTGC GTATCCGCCA AAGGCAAATA 

501 CTACGCCACA CCGCAAAAAC TGAACGCCGA TTACCATTTT GAGCAAAGTG 

551 TGCCTGCCGA TATTTATTAC ACGGTTACTG AAGAACATAC CGACAAATCC 

601 AAGCTGTTTG CAAATATCTT ATATACGCCC CCCTTTTTGA TACTGGATGC 

651 GGCGGGCGCG GTACTGGCCT TGCCTGCGGC GGCTCTGGGT GCGGTCGTGG 

701 ATGCCGCCCG CAAATGA 

This corresponds to the amino acid sequence <SEQ ID 158; ORF28-l>: 

! MLFRKTTAAV LAATLMLNG C TLMLWGMNNP VSETITRKHV DKDQIRAFGV 

51 VAEDNAQLEK GSLVMMGGKY WFWNPEDSA KLTGILKAGL DKPFQIVEDT 

101 PSYARHQALP VKLESPGSQN FSTEGLCLRY DTDKPADIAK LKQLGFEAVK 

151 LDNRTIYTRC VSAKGKYYAT PQKLNADYHF EQSVPADIYY TVTEEHTDKS 
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201 KLFANILYTP P FLILDAAGA VLALPAAAL G AWDAARK* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.meninzitidis (strain A) 

ORF28 shows 79.2% identity over a 120aa overlap with an ORF (ORF28a) from strain A ofK 
meningitidis: 

10 20 30 40 50 60 

orf 28 . pep MLFRKTTAAVIAHTLMLNG CTLMLWGMNNPVSETITRKHVXKDOIRX 

MIMIIIIIM llttllll:|:li||:t IN :|||| ||||| M I M I ! I M M I 
orf 28a MLElCTTAAVI^TmLNG CTVMMWGMNSPFSETTARKHVDKDOIt»VFnwflRnNaoT.R^ 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 28 .pep GSLVMMGGKYWFWNPEDSAXXTGILXAGLDKPFQIVXDTPSYXCHQALPVKLGSXGSQN 
MIMiMIIIIIIIIIIII Nil Mill 11:1 :| : : I I I I I I I I :||| 
orf 28a GS LVMMGGKYWFWN PEDSAKLTG I LKAGLDKQFQMVE PN PRFA- YQALPVKLES PASQN 

70 80 90 100 110 

orf 28a FSTEGLCLRYDTDRPADIAKLKQLEE^VELDNRTIYTRCVSAKGKYYATPQKLNADYHF 
120 130 140 150 160 170 

The complete length ORF28a nucleotide sequence <SEQ ID 159> is: 

1 ATGTTGTTCC GTAAAACGAC CGCCGCCGTT TTGGCGGCAA CCTTGATGTT 

51 GAACGGCTGT ACGGTAATGA TGTGGGGTAT GAACAGCCCG TTCAGCGAAA 

101 CGACCGCCCG CAAACACGTT GACAAGGACC AAATCCGCGC CTTCGGTGTG 

151 GTTGCCGAAG ACAATGCCCA ATTGGAAAAG GGCAGCCTGG TGATGATGGG 

201 CGGGAAATAC TGGTTCGTCG TCAATCCTGA AGATTCGGCG AAGCTGACGG 

251 GCATTTTGAA GGCCGGGTTG GACAAGCAGT TTCAAATGGT TGAGCCCAAC 

301 CCGCGCTTTG CCTACCAAGC CCTGCCGGTC AAACTCGAAT CGCCCGCCAG 

351 CCAGAATTTC AGTACCGAAG GCCTTTGCCT GCGCTACGAT ACCGACAGAC 

401 CTGCCGACAT CGCCAAGCTG AAACAGCTTG AGTTTGAAGC GGTCGAACTC 

451 GACAATCGGA CCATTTACAC GCGCTGCGTC TCCGCCAAAG GCAAATACTA 

501 CGCCACACCG CAAAAACTGA ACGCCGATTA TCATTTTGAG CAAAGTGTGC 

551 CTGCCGATAT TTATTACACG GTTACGAAAA AACATACCGA CAAATCCAAG 

601 TTGTTTGAAA ATATTGCATA TACGCCCACC ACGTTGATAC TGGATGCGGT 

651 GGGCGCGGTG CTGGCCTTGC CTGTCGCGGC GTTGATTGCA GCCACGAATT 

701 CCTCAGACAA ATGA 

This encodes a protein having amino acid sequence <SEQ ID 160>: 

1 MLFRKTTAAV LAATLMLNG C TVMMWGMNSP FSETTARKHV DKDQIRAFGV 

51 VAEDNAQLEK GSLVMMGGKY WFWNPEDSA KLTGILKAGL DKQFQMVEPN 

101 PRFAYQALPV KLESPASQNF STEGLCLRYD TDRPADIAKL KQLEFEAVEL 

151 DNRTIYTRCV SAKGKYYATP QKLNADYHFE QSVPADIYYT VTKKHTDKSK 

201 LFENIAYTPT T LILDAVGAV LALPVAALI A ATNSSDK* 

ORF28a and ORF28-1 show 86.1% identity in 238 aa overlap: 

10 20 30 40 50 60 

orf 28a. pep MLFRKTTAAVLAATLMLNGCTVMMWGMNSPFSETTARKHVDKDQIRAFGVVAEDNAQLEK 
MMMMMIMMIIMU:|:IMI:| Ml : II I I II II II II II II I II I M II 
MLFRKTTAAVLAATI^LNGCTLMLWGMNNPVSETITRKHVDKDQIRAFGVVAEDNAQLEK 
10 20 30 40 50 60 



orf28-l 



70 80 90 100 110 119 

orf 28a . pep GSLVMMGGKYWFWNPEDSAKLTGI LKAGLDKQFQMVEPNPRFA- YQALPVKLES PASQN 

Ml MM Ml Mill II II MM III III II I ICM :| : I Mill I MM I: Ml 
GSLVMMGGKYWFWNPEDSAKLTGILKAGLDKPFQIVEDTPSYARHQALPVKLESPGSQN 
70 80 90 100 110 120 



orf28-l 



120 130 140 150 160 170 179 

orf 28a. pep FSTEGLCLRYDTDRPAD I AKLKQLE FEAVE LDNRT I YTRCVSAKGKYYATPQKLNADYHF 

MMIIIIMIIIMIIIMIIII III |:|| II Mill II I M M 1 1 I M 1 1 M I M II 
orf 28-1 FSTEGLCLRYDTDKPADIAKLKQLGFEAVKLDNRT I YTRCVSAKGKYYATPQKLNADYHF 

130 140 150 160 170 180 
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180 190 200 210 220 230 

orf 28a . pep EQSVPADIYYTVTKKHTDKSKLFENIAYTPTTLILDAVGAVLALPVAALIAATNSSDKX 
I I I I II M I I I I I : : I I I II I II H Ml I II I 1 : I I I I I I I : I I I {::::: II 
5 orf 2 8-1 EQS VPAD I YYTVTEEHTDK SKL FAN I LYT PP FL I LDAAGAVLALPAAALGAWDAARKX 

190 200 210 220 230 

Homology with a predicted ORF from N. gonorrhoeae 

ORF28 shows 84.2% identity over a 120aa overlap with a predicted ORF (ORF28.ng) from N. 
10 gonorrhoeae: 

or f 2 8 . pep MLFRKTTAAVLAHTLMLNGCTLMLWGMNN PVSET ITRKHVXKDQIRX FG WAEDNAQLEK 60 

MM I III I Ml II: II 111:11 MM II MM II III Mill I I Ml III Mill 
or f 2 8ng MLFRKTTAAVLAATLILNGCTMMLRGMNNPVSQTITRKHVDKDQIRAFGVVAEDNAQLEK 60 

15 orf 28 .pep GSLVMMGGKYWF\AWPEDSAXXTGILXAGLDKPFQIVXDTPSYXCHQALPVKLGSXGSQN 120 

I M I I M M I II : II I I I f I MM I I I I I I I i I I Mill I I I I I I i r : MM 
orf28ng GSLVMMGGKYWFAVNPEDSAKLTGLLKAGLDKPFQIVEDTPSYARHQALPVKFEAPGSQN 120 

The complete length ORF28ng nucleotide sequence <SEQ ID 161> is 

1 ATGTTGTTCC GTAAAACGAC CGCCGCCGTT TTGGCGGCAA CCTTGATACT 

20 51 GAACGGCTGT ACGATGATGT TGCGGGGGAT GAACAACCCG GTCAGCCAAA 

101 CAATCACCCG CAAACACGTT GACAAAGACC AAATCCGCGC CTTCGGTGTG | 

151 GTTGCCGAAG ACAATGCCCA ATTGGAAAAG GGCAGCCTGG TGATGATGGG 

201 CGGGAAATAC TGGTTCGCCG TCAATCCCGA AGATTCGGCG AAGCTGACGG 

251 GCCTTTTGAA GGCCGGGTTG GACAAGCCCT TCCAAATAGT TGAGGATACC 

25 301 CCGAGCTATG CCCGCCACCA AGCCCTGCCG GTCAAATTCG AAGCGCCCGG 

351 CAGCCAGAAT TTCAGTACCG GAGGTCTTTG CCTGCGCTAT GATACCGGCA j 

401 GACCTGACGA CATCGCCAAG CTGAAACAGC TTGAGTTTAA AGCGGTCAAA 

451 CTCGACAATC GGACCATTTA CACGCGCTGC GTATCCGCCA AAGGCAAATA 

501 CTACGCCACG CCGCAAAAAC TGAACGCCGA TTATCATTTT GAGCAAAGTG 

30 551 TGCCCGCCGA TATTTATTAT ACGGTTACTG AAAAACATAC CGACAAATCC 

601 AAGCTGTTTG GAAATATCTT ATATACGCCC CCCTTGTTGA TATTGGATGC 

651 GGCGGCCGCG GTGCTGGTCT TGCCTATGGC TCTGATTGCA GCCGCGAATT 

701 CCTCAGACAA ATGA 

This encodes a protein having amino acid sequence <SEQ ID 162>: 

35 1 MLFRKTTAAV LAATLILNG C TMMLRGMNNP VSQTITRKHV DKDQIRAFGV 

51 VAEDNAQLEK GSLVMMGGKY WFAVNPEDSA KLTGLLKAGL DKPFQIVEDT 

101 PSYARHQALP VKFEAPGSQN FSTGGLCLRY DTGRPDDIAK LKQLEFKAVK 

151 LDNRTIYTRC VSAKGKYYAT PQKLNADYHF EQSVPADIYY TVTEKHTDKS 

201 KLFGNILYTP P LLILDAAAA VLVLPMALIA AANSSDK* 

40 ORF28ng and ORF28-1 share 90.0% identity in 23 1 aa overlap: 

10 20 30 40 50 60 

orf 28-1 . pep MLFRKTTAAVLAATLMLNGCTLMLWGMNNPVSET ITRKHVDKDQIRAFGVVAE DNAQLEK 
I I I M I M II II M I : II I M : II II II I II : M I II M I! II M II II II II M II II 
0rf28ng MLFRKTTAAVLAATLI LNGCTMMLRGMNNPVSQT ITRKHVDKDQIRAFGVVAE DNAQLEK 

45 10 20 30 40 50 60 

70 80 90 100 110 120 

orf 28-1 . pep GSLVMMGGKYWFVVNPEDSAKLTGILKAGLDKPFQIVEDTPSYARHQALPVKLESPGSQN 
M II II I II I II : II II II II I II : II II M II I I I II II II M I II I M M : I : I I I II 
50 orf28ng GSLVMMGGKYWFAVNPEDSAKLTGLLKAGLDKPFQIVEDTPSYARHQALPVKFEAPGSQN 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 28-1 . pep FSTEGLCLRYDTDKPADIAKLKQLGFEAVKLDNRTIYTRCVSAKGKYYATPQKLNADYHF 
55 Ml Ml III II :| MM MM I : II It II I II II II I I M II II II II I I II II II 

orf28ng FSTGGLCLRYDTGRPDDIAKLKQLE FKAVKLDNRTI YTRCVSAKGKYYAT PQKLNADYHF 

130 140 150 160 170 180 

190 200 210 220 230 239 

60 orf 28-1 . pep EQSVPADIYYTVTEEHTDKSKLFANILYTPPFLILDAAGAVLALPAAALGAWDAARKX 

I I M I I I II I M II : I I I I I M I : M M II I : I I M M M I I : II I ::|: 
orf28ng EQSVPADIYYTVTEKHTDKSKLFGNILYTPPLLILDAAAAVLVLPMALIAAANSSDKX 
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190 200 210 220 230 

Based on this analysis, including the presence of a putative transmembrane domain in the 
gonococcal protein, it was predicted that the proteins from N.meningitidis and N.gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF28-1 (24kDa) was cloned in pET and pGex vectors and expressed in Rcoli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
6 A shows the results of affinity purification of the GST-fusion protein, and Figure 6B shows the 
results of expression of the His-fusion in E.coli. Purified GST-fusion protein was used to immunise 
mice, whose sera were used for ELISA, which gave a positive result. These experiments confirm 
that ORF28-1 is a surface-exposed protein, and that it may be a useful immunogen. 

Example 20 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 163>: 

1 . .GTCAGTCCTG TACTGCCTAT TACACACGAA CGGACAGGGT TTGAAGGTGT 

51 TATCGGTTAT GAAACCCATT TTTCAGGGCA CGGACATGAA GTACACAGTC 

101 CGTTCGATCA TCATGATTCA AAAAGCACTT CTGATTTCAG CGGCGGTGTA 

151 GACGGCGGTT TTACTGTTTA CCAACTTCAT CGAACATGGT CGGAAATCCA 

201 TCCGGAGGAT GAATATGACG GGCCGCAAGC AGCG.ATTAT CCGCCCCCCG 

251 GAGGAGCAAG GGATATATAC AGCTATTATG TCAAAGGAAC TTCAACAAAA 

301 ACAAAGACTA GTATTGTCCC TCAAGCCCCA TTTTCAGACC GTTGGCTAGA 

351 AGAAAATGCC GGTGCCGCCT CTGGT. . 

This corresponds to the amino acid sequence <SEQ ID 164; ORF29>: 

1 . .VSPVLPITHE RTGFEGVIGY ETHFSGHGHE VHSPFDHHDS KSTSDFSGGV 
51 DGGFTVYQLH RTWSEIHPED EYDGPQAAXY PPPGGARDIY SYYVKGTSTK 
101 TKTSIVPQAP FSDRWLEENA GAASG. . 

Further work revealed the complete nucleotide sequence <SEQ ID 165>: 

1 ATGAATTTGC CTATTCAAAA ATTCATGATG CTGTTTGCAG CAGCAATATC 

51 GTTGCTGCAA ATCCCCATTA GTCATGCGAA CGGTTTGGAT GCCCGTTTGC 

101 GCGATGATAT GCAGGCAAAA CACTACGAAC CGGGTGGTAA ATACCATCTG 

151 TTTGGTAATG CTCGCGGCAG TGTTAAAAAG CGGGTTTACG CCGTCCAGAC 

201 ATTTGATGCA ACTGCGGTCA GTCCTGTACT GCCTATTACA CACGAACGGA 

251 CAGGGTTTGA AGGTGTTATC GGTTATGAAA CCCATTTTTC AGGGCACGGA 

301 CATGAAGTAC ACAGTCCGTT CGATCATCAT GATTCAAAAA GCACTTCTGA 

351 TTTCAGCGGC GGTGTAGACG GCGGTTTTAC TGTTTACCAA CTTCATCGAA 

401 CAGGGTCGGA AATCCATCCG GAGGATGGAT ATGACGGGCC GCAAGGCAGC 

451 GATTATCCGC CCCCCGGAGG AGCAAGGGAT ATATACAGCT ATTATGTCAA 

501 AGGAACTTCA ACAAAAACAA AGACTAATAT TGTCCCTCAA GCCCCATTTT 

551 CAGACCGTTG GCTAAAAGAA AATGCCGGTG CCGCCTCTGG TTTTTTCAGC 

601 CGTGCGGATG AAGCAGGAAA ACTGATATGG GAAAGCGACC CCAATAAAAA 

651 TTGGTGGGCT AACCGTATGG ATGATGTTCG CGGCATCGTC CAAGGTGCGG 

701 TTAATCCTTT TTTAATGGGT TTTCAAGGAG TAGGGATTGG GGCAATTACA 

751 GACAGTGCAG TAAGCCCGGT CACAGATACA GCCGCGCAGC AGACTCTACA 

801 AGGTATTAAT GATTTAGGAA AATTAAGTCC GGAAGCACAA CTTGCTGCCG 

851 CGAGCCTATT ACAGGACAGT GCTTTTGCGG TAAAAGACGG TATCAACTCT 

901 GCCAAACAAT GGGCTGATGC CCATCCAAAT ATAACAGCTA CTGCCCAAAC 

951 TGCCCTTTCC GCAGCAGAGG CCGCAGGTAC GGTTTGGAGA GGTAAAAAAG 

1001 TAGAACTTAA CCCGACTAAA TGGGATTGGG TTAAAAATAC CGGTTATAAA 

1051 AAACCTGCTG CCCGCCATAT GCAGACTTTA GATGGGGAGA TGGCAGGTGG 

1101 GAATAAACCT ATTAAATCTT TACCAAACAG TGCCGCTGAA AAAAGAAAAC 

1151 AAAATTTTGA GAAGTTTAAT AGTAACTGGA GTTCAGCAAG TTTTGATTCA 
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1201 GTGCACAAAA CACTAACTCC CAATGCACCT GGTATTTTAA GTCCTGATAA 

1251 AGTTAAAACT CGATACACTA GTTTAGATGG AAAAATTACA ATTATAAAAG 

1301 ATAACGAAAA CAACTATTTT AGAATCCATG ATAATTCACG AAAACAGTAT 

1351 CTTGATTCAA ATGGTAATGC TGTGAAAACC GGTAATTTAC AAGGTAAGCA 

1401 AGCAAAAGAT TATTTACAAC AACAAACTCA TATCAGGAAC TTAGACAAAT 

1451 GA 

This corresponds to the amino acid sequence <SEQ ID 166; ORF29-l>: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 



MNLPIQKFMM LFAAAISLLQ IPISHAN GLD 



FGNARGSVKK 
HEVHSPFDHH 
DYPPPGGARD 
RADEAGKLIW 
DSAVSPVTDT 
AKQWADAHPN 
KPAARHMQTL 
VHKTLTPNAP 
LDSNGNAVKT 



RVYAVQTFDA 
DSKSTSDFSG 
IYSYYVKGTS 
ESDPNKNWWA 
AAQQTLQGIN 
ITATAQTALS 
DGEMAGGNKP 
GILSPDKVKT 
GNLQGKQAKD 



TAVSPVLPIT 
GVDGGFTVYQ 
TKTKTNIVPQ 
NRMDDVRGIV 
DLGKLSPEAQ 
AAEAAGTVWR 
IKSLPNSAAE 
RYTSLDGKIT 
YLQQQTHIRN 



ARLRDDMQAK 
HERTGFEGVI 
LHRTGSEIHP 
APFSDRWLKE 
QGAVNPFLMG 
LAAASLLQDS 
GKKVELNPTK 
KRKQNFEKFN 
IIKDNENNYF 
LDK* 



HYEPGGKYHL 
GYETHFSGHG 
EDGYDGPQGS 
NAGAASGFFS 
FQGVGIGAIT 
AFAVKDGINS 
WDWVKNTGYK 
SNWSSASFDS 
RIHDNSRKQY 



Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF29 shows 88.0% identity over a 125aa overlap with an ORF (ORF29a) from strain A of N. 
meningitidis: 

10 20 30 

orf 29 . pep VSPVLPITHERTGFEGVIGYETHFSGHGHE 

I : I : I I I I i I I I I I I I : t I I I I I II I I I I I j 
orf 29a EPGGKYHLFGNARGSVKNRVYAVQTFDATAVGPILPITHERTGFEGIIGYETHFSGHGHE i 

50 60 70 80 90 100 j 

40 50 60 70 80 90 

orf 29 • pep VHSPFDHHDSKSTSDFSGGVDGGFTVYQLHRTWSEIHPEDEYDGPQAAXYPPPGGARDIY 
I lllll:MII II II III I IN I II I MM I I I I I I I 1 I Mill:: Mlllllllll 
orf 29a VHSPFDNHDSKSTSDFSGGVDGGFTVYQLHRTGSEIHPEDGYDGPQGSDYPPPGGARDIY 
110 120 130 140 150 160 



100 110 120 

orf 29 .pep SYYVKGTSTKTKTSIVPQAPFSDRWLEENAGAASG 
I II I 1 1 1 1 1 1 ::l I I: I 1 1 1 M I 1 : 1 1 I I 1 1 1 I 
orf 2 9a XXYVKGTSTKTKSNIVPRAPFSDRWLKENAGAASGFFSRADEAGKLIWESDPNKNWWANR 
170 180 190 200 210 220 



orf 29a MDDIRGI VQGAVN PFLMGFQGVGIGAITDSAVS PVT DTAAQQTLQGXNHLGXLS PEAQLA 

230 240 250 260 270 280 

The complete length ORF29a nucleotide sequence <SEQ ID 167> is: 



1 ATGAATTNGC CTATTCAAAA 

51 GTNGCTGCAA ATCCCNATTA 

101 GCGATGATAT GCAGGCAAAA 

151 TTTGGTAATG CTCGCGGCAG 

201 ATTTGATGCA ACTGCGGTCG 

251 CAGGATTTGA AGGCATTATC 

301 CATGAAGTAC ACAGTCCGTT 

351 TTTCAGCGGC GGCGTAGACG 

401 CAGGGTCGGA AATCCATCCG 

451 GATTATCCGC CCCCCGGAGG 

501 AGGAACTTCA ACAAAAACAA 

551 CAGACCGCTG GCTAAAAGAA 

601 CGTGCTGATG AAGCAGGAAA 

651 TTGGTGGGCT AACCGTATGG 

701 TTAATCCTTT TTTAATGGGT 

751 GACAGTGCAG TAAGCCCGGT 

801 AGGTATNAAT CATTTAGGAA 

851 CAACCGCATT ACAAGACAGT 

901 GCCAGACAAT GGGCTGATGC 



ATTCATGATG CTGTTTGCAG CAGCAATATC 
GTCATGCGAA CGGTTTGGAT GCCCGTTTGC 
CACTACGAAC CGGGTGGTAA ATACCATCTG 
TGTTAAAAAT CGGGTTTACG CCGTCCAAAC 
GCCCCATACT GCCTATTACA CACGAACGGA 
GGTTATGAAA CCCATTTTTC AGGACATGGA 
CGATAATCAT GATTCAAAAA GCACTTCTGA 
GTGGTTTTAC CGTTTACCAA CTTCATCGGA 
GAGGATGGAT ATGACGGGCC GCAAGGCAGC 
AGCAAGGGAT ATATACANNT ANTATGTCAA 
AGAGTAATAT TGTTCCCCGA GCCCCATTTT 
AATGCCGGTG CCGCCTCTGG TTTTTTCAGC 
ACTGATATGG GAAAGCGACC CCAATAAAAA 
ATGATATTCG CGGCATCGTC CAAGGTGCGG 
TTTCAAGGAG TAGGGATTGG GGCAATTACA 
CACAGATACA GCCGCGCAGC AGACTCTACA 
ANTTAAGTCC CGAAGCACAA CTTGCGGCTG 
GCTTTTGCGG TAAAAGACGG TATCAATTCC 
CCATCCGAAT ATAACTGCAA CAGCCCAAAC 
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951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 



TGCCCTTGCC 
TAGAACTTAA 
ACACCTGCTG 
GAATAGACCG 
CACAACCGTC 
CATGCTTATA 
TATCAATTCA 
ATCCANCAAA 
NATAAAACAG 
TACAGCATTT 



GTAGCAGANG 
CCCGACCAAA 
TTCGCACCAT 
CCTAAATCTA 
TTTACAAGCG 
ACAAGCATGT 
CCAGCAGATT 
TATGAAAGAG 
GGACNATAGT 
AGACCAACAT 



CCGCAACTAC 
TGGGATTGGG 
GCATACTTTG 
TAACGTCCAA 
CAACTAATTG 
CATAAGACAA 
TTGCTCGGCA 
TTACCTCGCG 
TATCCGAGAT 
CAGGTAAAAA 



GGTTTGGGGC 
TTAAAAATAC 
GATGGGGAAA 
CAGCAAAGCA 
GAGAACAAAT 
CAAGAATTTA 
TATTGAAAAT 
GTAGAACTGC 
AAAAATTCTG 
ATATTATGAT 



GGTAAAAAAG 
NGGCTATAAN 
TGGCCGGTGG 
GATGCTTCCA 
TANNNNNGGG 
CGGATTTAAA 
ATTGTTAGCC 
GTATTGGGAT 
ACGATGGAGG 
GATTTATAG 



This encodes a protein having amino acid sequence <SEQ ID 168>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 



MNXPIQKFMM LFAAAISXLQ IPISHANGLD 



FGNARGSVKN 
HEVHSPFDNH 
DYPPPGGARD 
RADEAGKLIW 
DSAVSPVTDT 
ARQWADAHPN 
TPAVRTMHTL 
HAYNKHVIRQ 
XKTGTIVIRD 



RVYAVQTFDA 
DSKSTSDFSG 
IYXXYVKGTS 
ESDPNKNWWA 
AAQQTLQGXN 
ITATAQTALA 
DGEMAGGNRP 
QEFTDLNINS 
KNSDDGGTAF 



TAVGPILPIT 
GVDGGFTVYQ 
TKTKSNIVPR 
NRMDDIRGIV 
HLGXLSPEAQ 
VAXAATTVWG 
PKSITSNSKA 
PADFARHIEN 
RPTSGKKYYD 



ARLRDOMQAK 
HERTGFEGII 
LHRTGSEIHP 
APFSDRWLKE 
QGAVNPFLMG 
LAAATALQDS 
GKKVELNPTK 
DASTQPSLQA 
IVSHPXNMKE 
DL* 



HYEPGGKYHL 
GYETHFSGHG 
EDGYDGPQGS 
NAGAASGFFS 
FQGVGIGAIT 
AFAVKDGINS 
WDWVKNTGYX 
QLIGEQIXXG 
LPRGRTAYWD 



ORF29a and ORF29-1 show 90.1% identity in 385 aa overlap: 



10 20 30 40 50 60 

orf 2 9a . pep MNXPIQKFMMLFAAAISXLQIPISHANGLDARLRDDMQAKHYEPGGKYHLFGNARGSVKN 
II I I I I J I I I I I I I t I I I I I I 1 I 1 I I I 1 I I I I I I I I I I f I I I I | | | | | | | | 1 I | t t I = 
orf 29-1 MNLPIQKFMMLFAAAISLLQIPISHANGLDARLRDDMQAKHYEPGGKYHLFGNARGSVKK 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 29a . pep RVYAVQTFDATAVGPILPITHERTGFEGI IGYETHFSGHGHEVHS PFDNHDSKSTSDFSG 

I I I I I I I I I I I I I : I : I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I : I I I I I 1 I I I I I 
orf 29-1 RVYAVQTFDATAVSPVLPITHERTGFEGVIGYETHFSGHGHEVHSPFDHHDSKSTSDFSG 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 29a . pep GVDGGFTVYQLHRTGSEIHPEDGYDGPQGSDYPPPGGARDIYXXYVKGTSTKTKSNIVPR 

I I tl 1 I I II I I I I 1 I I I I I I I I I I I I I II I I I t I I I I I I I I I I I I I I I I M ! : I I I I : 
orf 29-1 GVDGGFTVYQLHRTGSEIHPEDGYDGPQGSDYPPPGGARDIYSYYVKGTSTKTKTNIVPQ 

130 140 150 160 170 180 

190 200 210 220 230 240 

or f 2 9a . pep APFSDRWLKENAGAASGFFSRADEAGKLIWESDPNKNWWANRMDDIRGIVQGAVNPFLMG 

II I MINI II I II IN Ml III II III IN MINI II Mill hill I Mil I II I II 
orf 2 9-1 APFSDRWLKENAGAASGFFSRADEAGKLIWESDPNKNWWANRMDDVRGIVQGAVNPFLMG 

190 200 210 220 230 240 

250 260 270 280 290 300 

or f 2 9a . pep FQGVGIGAITDSAVSPVTDTAAQQTLQGXNHLGXLSPEAQLAAATALQDS AFAVKDGINS 

Ml HUM III II MINIM II! I II I II III MM Ml: MM HIM II III 
orf 29-1 FQGVGIGAITDSAVSPVTDTAAQQTLQGINDLGKLSPEAQLAAASLLQDSAFAVKDGINS 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 2 9a . pep ARQWADAHPNITATAQTALAVAXAATTVWGGKKVELNPTKWDWVKNTGYXTPAVRTMHTL 
l:MMIMIMMMMI::| II Ml MIMMMIMMIMM MM MM 
orf29-l AKQWADAH PN ITATAQTALS AAEAAGTVWRGKKVELN PTKWDWVKNTGYKKPAARHMQT L 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 29a. pep DGEMAGGNRPPKSITSNSKADASTQPSLQAQLIGEQIXXGHAYNKHVIRQQEFTDLNINS 

MIIIHMI II: II I: I 
orf 29-1 DGEMAGGNKPIKSLP-NSAAEKRKQNFEKFNSNWSSASFDSVHKTLTPNAPGILSPDKVK 

370 380 390 400 410 
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Homology with a predicted ORF from N. gonorrhoeae 

ORF29 shows 88.8% identity over a 125aa overlap with a predicted ORF (ORF29.ng) from N. 
gonorrhoeae: 

orf 2 9 . pep VSPVLPITHERTGFEGVIGYETHFSGHGHE 30 

1:1:11 II II III I II I I I I IIIIMIMI 
orf29ng EPGGKYHLFGNARGSVKNRVCAVQT FDATAVGP I LP ITHERTG FEGVI GYETHFSGHGHE 102 



orf 29 . pep VHSPFDHHDSKSTSDFSGGVDGGFTVYQLHRTWSEIHPEDEYDGPQAAXYPPPGGARDIY 90 

I II I I I : I I I I I I I I I II II II II I I ! I I I I I I I I I M I Mill:: MINIMI M 
orf29ng VHSPFDNHDSKSTSDFSGGVDGGFTVYQLHRTGSEIHPEDGYDGPQGGGYPPPGGARDIY 162 

orf 2 9 . pep S YYVKGTSTKTKT S I VPQAP FS DRWLEENAGAASG 125 

I I : : I I I I I I I I : I II I I I I I II I : I I I I I I I I 
orf 2 9ng SYHIKGTSTKTKINTVPQAPFSDRWLKENAGAASGFLSRADEAGKLIWENDPDKNWRANR 222 

The complete length ORF29ng nucleotide sequence <SEQ ID 169> is predicted to encode a protein 
having amino acid sequence <SEQ ID 170>: 



1 MNLPIQKFMM LFAAAISLLQ IPISHAN GLD ARLRDDMQAK HYEPGGKYHL 

51 FGNARGSVKN RVCAVQTFDA TAVGPILPIT HERTGFEGVI GYETHFSGHG 

101 HEVHSPFDNH DSKSTSDFSG GVDGGFTVYQ LHRTGSEIHP EDGYDGPQGG 

151 GYPPPGGARD IYSYHIKGTS TKTKINTVPQ APFSDRWLKE NAGAASGFLS f 

201 RADEAGKLIW ENDPDKNWRA NRMDDIRGIV QGAVNPFLTG FQGLGVGAIT 

251 DSAVSPVTYA AARKTLQGIH NLGNLSPEAQ LAAATALQDS AFAVKDSINS 

301 ARQWADAHPN ITATAQTALA VTEAATTVWG GKKVELNPAK WDWVKNTGYK 

351 KPAARHMQTV DGEMAGGNKP LESKNTVTTN NFFENTGYTE KVLRQASNGD 

401 YHGFPQSVDA FSENGTVIQI VGGDNIVRHK LYIPGSYKGK DGNFEYIREA 

451 DGKINHRLFV PNQQLPEK* 

In a second experiment, the following DNA sequence <SEQ ID 171> was identified: 



1 atgAATTTGC CTATTCAAAA 

51 gatgctGCat ATCCCCATTA 

101 GCGATGATAT GCAGGCAAAA 

151 TTTGGTAATG CTCGCGGCAG 

201 ATTTGATGCA ACTGCGGTCG 

251 CAGGATTTGA AGGTGTTATC 

301 CACGAAGTAC ACAGTCCGTT 

351 TTTCAGCGGC GGCGTAGACG 

401 CAGGGTCGGA AATACATCCC 

451 GGTTATCCGG AACCACAAGG 

501 AGGAACTTCA ACCAAAACAA 

551 CAGACCGCTG GCTAAAAGAA 

601 CGTGCGGATG AAGCAGGAAA 

651 TTGGCGGGCT AACCGTATGG 

701 TTAATCCTTT TTTAACGGGT 

751 GACAGTGCGG TAAGCCCGGT 

801 AGGTATTAAT GATTTAGGAA 

851 CGAGCCTATT ACAGGACAGT 

901 GCCAGACAAT GGGCTGATGC 

951 TGCCCTTGCC GTAGCAGAGG 

1001 TAGAACTTAA CCCGACCAAA 

1051 AAACCTGCTG CCCGCCATAT 

1101 GAATAGACCG CCTAAATCTA 

1151 CCTATCCTAA GTTGGTTAAT 

1201 GCGGCTCAAG ATCCAAGATT 

1251 TTTTCCAATA GGAACTGCAA 

1301 TTTGGGTTGG TGAGGGTGCA 

1351 AGAGATGGCA CTCGACAATA 

1401 TGCAACTACA GGTATTCAAG 

1451 ATGAAAAAAG AAATAAAATT 

This encodes a protein having amino acic 



ATTCATGATG ctgttggcAg cggcaatatc 
GTCATGCGAA CGGTTTGGAT GCCCGTTTGC 
CACTACGAAC CGGGTGGCAA ATACCATCTG 
TGTTAAAAAT CGGGTTTGCG CCGTCCAAAC 
GCCCCATACT GCCTATTACA CACGAACGGA 
GGCTATGAAA CCCATTTTTC AGGACACGGA 
CGATAATCAT GATTCAAAAA GCACTTCTGA 
GCGGTTTTAC CGTTTACCAA CTTCATCGGA 
GCAGACGGAT ATGACGGGCC TCAAGGCGGC 
GGCAAGGGAT ATATACAGCT ACCATATCAA 
AGATAAACAC TGTTCCGCAA GCCCCTTTTT 
AATGCCGGTG CCGCTTCCGG TTTTCTCAGC 
ACTGATATGG GAAAACGACC CCGATAAAAA 
ATGATATTCG CGGCATCGTC CAAGGTGCGG 
TTTCAAGGGG TAGGGATTGG GGCAATTACA 
CACAGATACA GCCGCTCAGC AGACTCTACA 
ATTTAAGTCC GGAAGCACAA CTTGCCGCCG 
GCCTTTGCGG TAAAAGACGG CATCAATTCC 
CCATCCGAAT ATAACAGCAA CAGCCCAAAC 
CCGCAGGTAC GGTTTGGCGC GGTAAAAAAG 
TGGGATTGGG TTAAAAATAC CGGCTATAAA 
GCAGACTGTA GATGGGGAGA TGGCAGGGGG 
TAACGTCGGA AGGAAAAGCT AATGCTGCAA 
CAGCTAAATG AGCAAAACTT AAATAACATT 
GAGTCTAGCT ATTCATGAGG GTAAAAAAAA 
CTTATGAAGA GGCAGATAGA CTAGGTAAAA 
AGACAAACTA GTGGAGGCGG ATGGTTAAGT 
TCGGCCACCA ACAGAAAAAA AATCACAATT 
CAAATTTTGA AACTTATACT ATTGATTCAA 
AAAAATGGAC ATTTAAATAT TAGGTAA 

sequence <SEQ ID 172; ORF29ng-l>: 



1 MNLPIQKFMM LLAAAISMLH IPISHAN GLD ARLRDDMQAK HYEPGGKYHL 
51 FGNARGSVKN RVCAVQTFDA TAVGPILPIT HERTGFEGVI GYETHFSGHG 
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101 
151 
201 
251 
301 
351 
401 
451 



HEVHSPFDNH 
GYPEPQGARD 
RADEAGKLIW 
DSAVSPVTDT 
ARQWADAHPN 
KPAARHMQTV 
AAQDPRLSLA 
RDGTRQYRPP 



DSKSTSDFSG 
IYSYHIKGTS 
ENDPDKNWRA 
AAQQTLQGIN 
ITATAQTALA 
DGEMAGGNRP 
IHEGKKNFPI 
TEKKSQFATT 
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GVDGGFTVYQ 
TKTKINTVPQ 
NRMDDIRGIV 
DLGNLSPEAQ 
VAEAAGTVWR 
PKSITSEGKA 
GTATYEEADR 
GIQANFETYT 



LHRTGSEIHP 
APFSDRWLKE 
QGAVNPFLTG 
LAAASLLQDS 
GKKVELNPTK 
NAATYPKLVN 
LGKIWVGEGA 
IDSNEKRNKI 



ADGYDGPQGG 
NAGAASGFLS 
FQGVGIGAIT 
AFAVKDCTNS 
WDWVKNTGYK 
QLNEQNLNNI 
RQTSGGGWLS 
KNGHLNIR* 



ORF29ng-l and ORF29-1 show 86.0% identity in 401 aa overlap: 



10 



15 
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25 



30 



35 



40 



45 



50 



55 



orf29ng-l.pep 



orf29-l 



orf29ng-l.pep 



orf29-l 



orf29ng-l.pep 



orf29-l 



orf29ng-l.pep 



orf29-l 



orf29ng-l.pep 
orf29-l 



orf29ng-l.pep 
orf29-l 



orf29ng-l.pep 
orf29-l 



10 20 30 40 50 60 

MNLPIQKFMMLLAAAISMLHIPISHANGLDARLRDDMQAKHYEPGGKYHLFGNARGSVKN 

I IMIIM II 1:111 11:1:1111111 || HIM || | MM II Mill || Mill || ||: 
MNLPIQKE^MLFAAAISLLQIPISHANGLDARLRDDMQAKHYEPGGKYHLFGNARGSVKK 

10 20 30 40 50 60 

70 80 90 100 110 120 

RVCAVQTFDATAVGPILPITHERTGFEGVIGYETHFSGHGHEVHSPFDNHDSKSTSDFSG 

II M I t t M i II : I r I I 1 M ! I I 1 I I I M 1 t | f | | | M II M I 1 t I I r I I I I M I I I M 
RVYAVQTFDATAVSPVLPITHERTGFEGVIGYETHFSGHGHEVHSPFDHHDSKSTSDFSG 

70 80 90 100 110 120 

130 140 150 160 170 180 

GVDGGFTVYQLHRTGSEIHPADGYDGPQGGGYPEPQGARDIYSYHIKGTSTKTKINTVPQ 
MINIM Mill Mlllll MIMIII: M I I II I I M I : : I I I I II II I III 
GVDGGFTVYQLHRTGSEIHPEDGYDGPQGSDYPPPGGARDIYSYYVKGTSTKTKTNIVPQ 

130 140 150 160 170 180 

190 200 210 220 230 240 

APFSDRWLKENAGAASGFLSRADEAGKLIWENDPDKNWRANRMDDIRGIVQGAVKPFLTG 
M II II II I M II I M I I : I I I I I I I I I M I : II : I I I I I I I I I : I I I I I I I I I I I I I 
APFS DRWLKENAGAASGFFSRADEAGKLIWE S DPNKNWWANRMDDVRGI VQGAVN PFLMG 

190 200 210 220 230 240 

250 260 270 280 290 300 

FQGVGIGAITDSAVSPVTDTAAQQTLQGINDLGNLSPEAQLAAASLLQDSAFAVKDGINS 
I M I I I I I I I I I II II I II I M 1 1 I | | | M I M : II I I I I I I M II I II I II I II I I I II 
FQGVGIGAITDSAVSPVTDTAAQQTLQGINDLGKLSPEAQLAAASLLQDSAFAVKDGINS 

250 260 270 280 290 300 

310 320 330 340 350 360 

ARQWADAHPNITATAQTALAVAEAAGTVWRGKKVELNPTKWDWVKNTGYKKPAARHMQTV 

Nil MIMMilMMIIIIMMMMIMIIIIIMMIMIIIIIIIIII: 

AKQWADAH PN I TATAQTALSAAEAAGTVWRGKKVELN PTKWDWVKNTG YKKPAARHMQTL 

310 320 330 340 350 360 

370 380 390 400 410 419 

DGEMAGGNRPPKSI-TSEGKANAATYPKLVNQLNEQNLNNIAAQDPRLSLAIHEGKKNFP 
I II II I I I M IN : I : : ::|: ::: ::::: 

DGEMAGGNKPIKSLPNSAAEKRKQNFEKFNSNWSSASFDSVHECTLTPNAPGILSPDKVKT 
370 380 390 400 410 420 



420 430 440 450 460 470 479 

orf 29ng-l . pep IGTATYEEADRLGKIWVGEGARQTSGGGWLSRDGTRQYRPPTEKKSQFATTGIQANFETY 

orf29-l RYTSLDGKITIIKDNENNYFRIHDNSRKQYLDSNGNAVKTGNLQGKQAKDYLQQQTHIRN 

430 440 450 460 470 480 



Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it is predicted that the proteins from Kmeningitidis and N. gonorrhoeae, and their epitopes, 
60 could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 21 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 173>: 
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1 ATGAAAAAAC AAATCACCGC AGCCGTAATG ATGCTGTCTA TGATTGCCCC 
51 CGCAATGGCA AACGGCTTGG ACAATCAGGC ATTTGAAGAC CAAATGTTCC 
101 ACACGCGGGC AGATGCACCG ATGCAG. . . 

This corresponds to the amino acid sequence <SEQ ID 174; 0RF3O: 

1 MKKQITAAVM MLSMIAPAMA NGLDNQAFED QMFHTRADAP MQ. . 

Further work revealed the complete nucleotide sequence <SEQ ID 175>: 

1 ATGAAAAAAC AAATCACCGC AGCCGTAATG ATGCTGTCTA TGATTGCCCC 

51 CGCAATGGCA AACGGCTTGG ACAATCAGGC ATTTGAAGAC CAAGTGTTCC 

101 ACACGCGGGC AGATGCACCG ATGCAGTTGG CGGAGCTTTC TCAAAAGGAG 

151 ATGAAGGAGA CAGAGGGGGC GTTTCTTCCA TTGGCTATCT TGGGTGGTGC 

201 TGCCATTGGT ATGTGGACAC AGCATGGTTT TAGTTATGCA ACGACAGGCA 

251 GACCAGCTTC TGTTAGAGAT GTTGCTATTG CTGGCGGATT AGGCGCAATT 

301 CCTGGTGGTG TAGGCGCCGC AGGAAAGGTT GTTTCCTTTG CTAAATAT&G 

351 ACGTGAGATT AAAATCGGCA ATAATATGCG GATAGCCCCT TTCGGTAATA 

401 GAACAGGTCA TCCTATTGGA AAATTTCCCC ATTATCATCG TCGAGTTACG 

451 GATAATACGG GCAAGACTTT GCCTGGACAG GGAATTGGTC GTCATCGCCC 

501 TTGGGAATCA AAATCTACGG ACAGATCATG GAAAAACCGC TTCTAA 

This corresponds to the amino acid sequence <SEQ ID 176; ORF30-1>: 

1 MKKQITAAVM MLSMIAPAMA NGLDNQAFED QVFHTRADAP MQXAELSQKE 

51 MKETE GAFLP LAILGGAAIG MW TQHGFSYA TTGRPASVRD VAIAGGLGAI 

101 PGGVGAAGKV VSFAKYGREI KIGNNMRIAP FGNRTGHPIG KFPHYHRRVT 

151 DNTGKTLPGQ GIGRHRPWES KSTDRSWKNR F* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with a predicted ORF from N.m eninzitidis f strain j 

ORF30 shows 97.6% identity over a 42aa overlap with an ORF (ORF30a) from strain A of N. 

meningitidis: 

10 20 30 40 

or f 30 . pep MKKQITAAVMMLSMIAPAMAN GLDNQAFEDOMFHTRADAPMO 
MMM III I I I ill Mill Ml || | | 1111:111 I I Mil I 
orf30a MKKQITAAVMMLSMIAPAMAN GLDNQAFEDQVFHTRADAPMQLAELSQKEMKXTX GAFLP 

10 20 30 40 50 60 

orf30a LXILGGAAIG^TQHGFS YATTGRPASTODVAIAGGLGAIPGXVGAAGKW^ FAKYGRE I 

70 80 90 100 110 120 

The complete length ORF30a nucleotide sequence <SEQ ID 177> is: 

1 ATGAAAAAAC AAATCACCGC AGCCGTAATG ATGCTGTCTA TGATTGCCCC 

51 CGCAATGGCA AACGGCTTGG ACAATCAGGC ATTTGAAGAC CAAGTGTTCC 

101 ACACGCGGGC AGATGCACCG ATGCAGTTGG CGGAGCTTTC TCAAAAGGAG 

151 ATGAAGGANA CAGNGGGGGC GTTTCTTCCA TTGGNTATCT TGGGTGGTGC 

201 TGCCATTGGT ATGTGGACAC AGCATGGTTT TAGTTATGCA ACGACAGGCA 

251 GACCAGCTTC TGTTAGAGAT GTTGCTATTG CTGGCGGATT AGGCGCAATT 

301 CCTGGTGNTG TAGGCGCCGC AGGAAAGGTT GTTTCCTTTG CTAAATATGG 

351 ACGTGAGATT AAAATCGGCA ATAATATGCG GATAGCCCCT TTCGGTAATA 

401 GAACAGGTCA TCCTATTGGN AAATTTCCCC ATTATCATCG TCGAGTTACG 

451 GATAATACGG GCAAGACTTT GCCTGGACAG GGAATTGGTC GTCATCGCCC 

501 TTGGGAATCA AAATCTACGG ACAGATCATG GAAAAACCGC TTCTAA 

This encodes a protein having amino acid sequence <SEQ ID 178>: 



1 MKKQITAAVM MLSMIAPAMA* NGLDNQAFED QVFHTRADAP MQLAELSQKE 

51 MKXTX GAFLP LXILGGAAIG MW TQHGFSYA TTGRPASVRD VAIAGGLGAI 

101 PGXVGAAGKV VSFAKYGREI KIGNNMRIAP FGNRTGHPIG KFPHYHRRVT 

151 DNTGKTLPGQ GIGRHRPWES KSTDRSWKNR F* 

ORF30a and ORF30-1 show 97.8% identity in 181 aa overlap: 



or f 30a . pep MKKQITAAVMMLSMIAPAMANGLDNQAFEDQVFHTRADAPMQLAELSQKEMKXTXGAFLP 60 
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10 



15 



MMIII Elf INI Mill IN IN |j Mm j|| || | | if | mi mm | Mill 
orf 30-1 MKKQITAAVMMLSMIAPAMANGLDNQAFEDQVFHTRADAPMQLAELSQBCEMKETEGAFLP 60 

orf 30a .pep LX I LGGAAIGMWTQHGFS YATTGRPAS VRDVAI AGGLGAI PGX VGAAGKWS FAKYGRE I 120 

1 I M M I II I I I I I I II I I I I I I M I I I I I M II I IM II I II M J I II I I I II II I I 
or f 3 0 - 1 LAILGGAAIGMWTQHGFS YATTGRPAS VRDVAI AGGLGAI PGGVGAAGKWS FAKYGRE I 120 

orf 30a. pep KIGNNMRIAPFGNRTGHPIGKFPHYHRRVTDNTGKTLPGQGIGRHRPWESKSTDRSWKNR 180 

MMMMMII Ml MINIMI lllllllll 1 1 M I I II I I M I II I I I 1 1 1 1 M I II 
orf30-l KIGNNMRIAPFGNRTGHPIGKFPHYHRRVTDNTGKTLPGQGIGRHRPWESKSTDRSWKNR 180 

orf 30a. pep FX 
II 

orf30-l FX 

Homology with a predicted ORF from N. gonorrhoeae 

ORF30 shows 97.6% identity over a 42aa overlap with a predicted ORF (ORF30.ng) from N. 
gonorrhoeae: 



20 



or f 30 . pep MKKQITAAVMMLSMIAPAMANGLDNQAFEDQMFHTRADAPMQ 
UNI MM IIIIMII llllll MM I 111:11 Ml III II 
or f 3 Ong MKKQITAAVMMLSMIAPAMANGLDNQAFEDQVFHTRADAPMQLAELSQKEMKETEGAFLP 

The complete length ORF30ng nucleotide sequence <SEQ ID 179> is 



42 



60 



25 



30 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 



ATGAAAAAAC 
CGCAATGGCA 
ACACGCGGGC 
ATGAAGGAGA 
TGCCATTGGT 
GACCAGCTTC 
GATGTAGGTG 
GATTAAAATC 
GTCATCCTAT 
ACGGGCAAGA 
ATCAAAATCT 



AAATCACCGC 
AACGGATTGG 
AGATGCGCCG 
CTGAAGGGGC 
ATGTGGACAC 
TGTTAGAGAT 
CTGCAGGAAA 
GGCAATAATA 
TGGAAAATTT 
CTTTGCCTGG 
ACGGACAGAT 



AGCCGTAATG 
ACAATCAGGC 
ATGCAGTTGG 
TTTTCTTCCA 
AGCATGGTTT 
GTTGCTGGCG 
GGTTGTTTCC 
TGCGGATAGC 
CCCCATTATC 
ACAGGGAATT 
CATGGAAAAA 



ATGCTGTCTA 
ATTTGAAGAC 
CGGAGCTTTC 
TTGGCTATCT 
TAGTTATGCA 
GATTAGGCGC 
TTTGCTAAAT 
CCCTTTCGGT 
ATCGTCGAGT 
GGTCGTCATC 
CCGCTTCTAA 



TGATCGCCCC 
CAAGTGTTCC 
TCAGAAGGAG 
TGGGTGGTGC 
ACGACAGGCA 
AATTCCTGGT 
ATGGACGTGA 
AATAGAACAG 
TACGGATAAT 
GCCCTTGGGA 



35 



This encodes a protein having amino acid sequence <SEQ ID 180>: 

1 MKKQITAAVM MLSMIAPAM A NGLDNQAFED QVFHTRADAP MQLAELSQKE 

51 MKETEGAFLP LAILGGAAIG MWTQHGFSYA TTGRPASVRD VAGGLGAIPG 

101 DVGAAGKWS FAKYGRE I KI GNNMRIAPFG NRTGHPIGKF PHYHRRVTDN 

151 TGKTLPGQGI GRHRPWESKS TDRSWKNRF* 

ORF30ng and ORF30-1 show 98.3% identity in 181 aa overlap: 
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orf 30ng,pep 
orf30-l 

orf 3 Ong. pep 
orf30-l 

orf30ng.pep 
orf30-l 

orf 30ng.pep 
orf30-l 



10 20 30 40 50 60 

MKKQITAAVMMLSMIAPAMANGLDNQAFEDQVFHTRADAPMQLAELSQKEMKETEGAFLP 

INI MM Mill I M I (Ml I MM Ml II I I I i I I i I I M M I I I I I I I II i 

MKKQI TAAVMMLSM I APAMANGLDNQAFEDQVFHTRADAPMQLAE L SQKEMKETEGAFLP 

10 20 30 40 50 60 

70 80 90 100 110 

LAI LGGAAIGMWTQHG FS YATTGRPAS VRDVA — GGLGAI PG DVGAAGKWS FAKYGRE I 
N I 1 1 N 1 1 I I I I I I I 1 1 M I I I 1 1 I II I II I I I II M I I I I II 1 1 II I I I I I II II 
LAI LGGAAIGMWTQHG FSYATTGRPAS VRDVAI AGGLGAI PGGVGAAGKWS FAKYGRE I 

70 80 90 100 110 120 

120 130 140 150 160 170 

KIGNNMRIAPFGNRTGHPIGKFPHYHRRVTDNTGKTLPGQGIGRHRPWESKSTDRSWKNR 
I I N II I I I I II II I I I I II N II I I I II I II I I I I II I I I II I I I I M I II I I M I I I I 
KIGNNMRIAPFGNRTGHPIGKFPHYHRRVTDNTGKTLPGQGIGRHRPWESKSTDRSWKNR 
130 140 150 160 170 180 

180 
FX 
I I 
FX 
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Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it is predicted that the proteins from N.meningitidis and N. gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 22 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 181>: 

1 ATGAATAAAA CTCTCTATCG TGTAATTTTC AACCGCAAAC GTGGGGCTGT 

51 GrTAGCCGTT GCTGAAACTA CCAAGCGCGA AGGTAAAAGC TGTGCCGATA 

101 GTGATTCAGG CAGCGCTCAT GTGAAATCTG TTCCTTTTGG TACTACTCAT 

151 GCACCTGTTT GTg.CGTTaC AAATATCTTT TCTTTTTCTT TATTGGGCTT 

201 TTCTTTATGT TTGGCTGTAG GtacGGyCAA TATTGCTTTT GCTGATGGCA 

251 TT. . 

This corresponds to the amino acid sequence <SEQ ID 182; ORF31>: 

1 MNKTLYRVIF NRKRGAVXAV AETTKREGKS CADSDSGSAH VKSVPFGTTH 
51 APVCXVTNIF SFSLLGFSLC LAVGTXNIAF ADGI . . 

Further work revealed a further partial nucleotide sequence <SEQ ID 183>: 

i 

1 ATGAATAAAA CTCTCTATCG TGTAATTTTC AACCGCAAAC GTGGGGCTGT 
51 GGTAGCCGTT GCTGAAACTA CCAAGCGCGA AGGTAAAAGC TGTGCCGATA 
101 GTGATTCAGG CAGCGCTCAT GTGAAATCTG TTCCTTTTGG TACTACTCAT 
151 GCACCTGTTT GTCGTTCAAA TATCTTTTCT TTTTCTTTAT TGGGCTTTTC 
201 TTTATGTTTG GCTGTAGGTA CGGCCAATAT TGCTTTTGCT GATGGCATT. . 

This corresponds to the amino acid sequence <SEQ ID 184; ORF31-l>: 

1 MNKTLYRVIF NRKRGAWAV AETTKREGKS CADSDSGSAH VKSVPFGTTH 
51 APVCRSNIFS FSLLGFSLCL AVGTANIAFA DGI . . 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. gonorrhoeae 

ORF31 shows 76.2% identity over a 84aa overlap with a predicted ORF (ORF31.ng) from N. 
gonorrhoeae: 



orf 31 . pep MNKTLYRVIFNRKRGAVXAVAETTKREGKSCADSDSGSAHVKSVPFGTTHAPVCXVTNIF 60 

I I I I I I I I I 1 I I I I I! I I I I I I I I I I I I I I I I I III::|ltl III :: I 

orf31ng MNKTLYRVIFNRKRGAWAVAETTKREGKSCADSGSGSVYVKSVSFIPTH SKAF 54 

orf 31. pep SFSLLGFSLCLAVGTXNIAFADGI 84 

II I II I I I I I : I I I I I I I II I 

orf31ng C FS ALGFSLCLALGTVN I AFADGI ITDKAAPKTQQAT I LQTGNGI PQVN IQT PTSAGVS V 114 



The complete length ORF31ng nucleotide sequence <SEQ ID 185> is: 



1 ATGAACAAAA CCCTCTATCG TGTGATTTTC AACCGCAAAC GCGGTGCTGT 

51 GGTAGCTGTT GCCGAAACCA CCAAGCGCGA AGGTAAAAGC TGTGCCGATA 

101 GTGGTTCGGG CAGCGTTTAT GTGAAATCCG TTTCTTTCAT TCCTACTCAT 

151 TCCAAAGCCT TTTGTTTTTC TGCATTAGGC TTTTCTTTAT GTTTGGCTTT 

201 GGGTACGGTC AATATTGCTT TTGCTGACGG CATTATTACT GATAAAGCTG 

251 CTCCTAAAAC CCAACAAGCC ACGATTCTGC AAACAGGTaa cGGCATACCG 

301 CAAGTCAATA TTCAAACCCC TACTTCGGCA GGGGTTTCTG TTAATCAATA 

351 TGCCCAGTTT GATGTGGGTA ATCGCGGGGC GATTTTAAAC AACAGTCGCA 

401 GCAACACCCA AACACAGCTA GGCGGTTGGA TTCAAGGCAA TCCTTGGTTG 

451 ACAAGGGGCG AAGCACGTGT GGTTGTAAAC CAAATCAACA GCAGCCATCC 

501 TTCACAACTG AATGGCTATA TTGAAGTGGG TGGACGACGT GCAGAAGTCG 

551 TTATTGCCAA TCCGGCAGGG ATTGCAGTCA ATGGTGGTGG TTTTATCAAT 

601 GCTTCCCGTG CCACTTTGAC GACAGGCCAA CCGCAATATC AAGCAGGAGA 

651 CTTTAGCGGC TTTAAGATAA GGCAAGGCAA TGCTGTAATC GCCGGACACG 
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701 GTTTGGATGC CCGTGATACC GATTTCACAC GTATTCTTGT ATGCCAACAA 
751 AATCACCTTG ATCAGTACGG CCGAACAAGC AGGCATTCGT AA 

This encodes a protein having amino acid sequence <SEQ ID 186>: 

1 MNKTLYRVIF NRKRGAWAV AETTKREGKS CADSGSGSVY VKSVSFIPTH 
51 SKAFCFSALG FSLCLALGTV NIAFADGIIT DKAAPKTQQA TILQTGNGIP 
101 QVNIQTPTSA GVSVNQYAQF DVGNRGAILN NSRSNTQTQL GGWIQGNPWL 
151 TRGEARVWN QINSSHPSQL NGYIEVGGRR AEWIANPAG IAVNGGGFIN 
201 ASRATLTTGQ PQYQAGDFSG FKIRQGNAVI AGHGLDARDT DFTRILVCQQ 
251 NHLDQYGRTS RHS* 

This gonococcal protein shares 50% identity over a 149aa: overlap with the pore-forming 
hemolysins-like HecA protein from Erwinia chrysanthemi (accession number L39897): 

orf31ng 96 GNGIPQVNIQTPTSAGVSVNQYAQFDVGNRGAILNNSRSN-TQTQLGGWIQGNPWLTRGE 154 

GNG+P VNI TP ++G+S N+Y F+V NRG ILNN + T +QLGG IQ NP L 
HecA 45 GNGVPWNIATPDASGLSHNRYHDFNVDNRGLILNNGTARLTPSQLGGLIQNNPNLNGRA 104 

Orf31ng 155 ARWVNQINSSHPSQLNGYIEVGGRRAEWIANPAGIAVNGGGFINASRATLTTGQPQYQ 214 

A ++N++ S + S+L GY+EV G+ A W+ANP GI +G GF+N R TLTTG PQ+ 
HecA 105 AAAILNEWSPNRSRLAGYLEVAGQAANVWANPYGITCSGCGFLNTPRLTLTTGTPQFD 164 

Orf31ng 215 -AGDFSGFKIRQGNAVIAGHGLDARDTDF 242 

AG SG +R G+ +1 G GLDA +D+ 
HecA 165 AAGGLSGLDVRGGDILIDGAGLDASRSDY 193 

Furthermore, ORF31ng and ORF31-1 show 79.5% identity in 83 aa overlap: 

10 20 30 40 50 60 

orf 31-1 .pep MNKTLYRVI FNRKRGAWAVAETTKREGKSCADSDSGSAHVKSVPFGTTHAPVCRSNIFS 

I II I I I I I I II I II I I I I I I M I I I I I I I I I I I I lll::llll III I: I 

orf 31ng MNKTLYRVI FNRKRGAWAVAETTKREGKSCADSGSGSVYVKSVSFIPTH SKAFC 

10 20 30 40 50 

70 80 
orf 31-1. pep FS LLG FSLCLAVGTANIAFADG I 
I I I I I I I I I I : II : I I I II I I I 
or f 3 lng FS ALGFSLCLALGTVNIAFADGI ITDKAAPKTQQATILQTGNGI PQVNIQTPTSAGVSVN 

60 70 80 90 100 110 

On this basis, including the homology with hemolysins, and also with adhesins, it is predicted that 
the proteins from N.meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens 
for vaccines or diagnostics, or for raising antibodies. 

Example 23 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 187>: 

1 ATGAATACTC CTCCTTTTGT CTGTTGGATT TTTTGCAAGG TCATCGACAA 

51 TTTCGGCGAC ATCGGCGTTT CGTGGCGGCT CGCCCGTGTT TTGCACCGCG 

101 AACTCGGTTG GCAGGTGCAT TTGTGGACGG ACGATGTGTC CGCCTTGCGT 

151 GCGCTTTGCC CTGATTTGCC CGATGTTCCC TGCGTTCATC AGGATATTCA 

201 TGTCCGCACT TGGCATTCCG ATGCGGCAGA TATTGATACC GCG. . 

This corresponds to the amino acid sequence <SEQ ID 188; ORF32>: 

1 MNTPPFVCWI FCKVIDNFGD IGVSWRLARV LHRELGWQVH LWTDDVSALR 
51 ALCPDLPDVP CVHQDIHVRT WHSDAADIDT A. . 

Further work revealed the complete nucleotide sequence <SEQ ID 189>: 

1 ATGAATACTC CTCCTTTTGT CTGTTGGATT TTTTGCAAGG TCATCGACAA 
51 TTTCGGCGAC ATCGGCGTTT CGTGGCGGCT CGCCCGTGTT TTGCACCGCG 
101 AACTCGGTTG GCAGGTGCAT TTGTGGACGG ACGATGTGTC CGCCTTGCGT 



WO 99/24578 



-152- 



PCT/IB98/01665 



151 GCGCTTTGCC CTGATTTGCC CGATGTTCCC TGCGTTCATC AGGATATTCA 

201 TGTCCGCACT TGGCATTCCG ATGCGGCAGA TATTGATACC GCGCCTGTTC 

251 CCGATGTCGT CATCGAAACT TTTGCCTGCG ACCTGCCCGA AAATGTGCTG 

301 CACATTATCC GCCGACACAA GCCGCTTTGG CTGAATTGGG AATATTTGAG 

351 CGCGGAGGAA AGCAATGAAA GGCTGCATCT GATGCCTTCG CCGCAGGAGG 

401 GTGTTCAAAA ATATTTTTGG TTTATGGGTT TCAGCGAAAA AAGCGGCGGG 

451 TTGATACGCG AACGTGATTA CTGCGAAGCC GTCCGTTTCG ATACTGAAGC 

501 CCTGCGAGAG CGGCTGATGC TGCCCGAAAA AAACGCCTCC GAATGGCTGC 

551 TTTTCGGCTA TCGGAGCGAT GTTTGGGCAA AGTGGCTGGA AATGTGGCGA 

601 CAGGCAGGCA GCCCGATGAC ACTGTTGCTG GCGGGGACGC AAATCATCGA 

651 CAGCCTCAAA CAAAGCGGCG TTATTCCGCA AGATGCCCTG CAAAACGACG 

701 GCGATGTTTT TCAGACGGCA TCCGTCCGCC TCGTCAAAAT CCCTTTCGTG 

751 CCGCAACAGG ACTTCGACCA ACTGCTGCAC CTTGCCGACT GCGCCGTCAT 

801 CCGCGGCGAA GACAGTTTCG TGCGCGCCCA GCTTGCGGGC AAACCCTTCT 

851 TTTGGCACAT CTACCCGCAA GACGAGAATG TCCATCTCGA CAAACTCCAC 

901 GCCTTTTGGG ATAAGGCACA CGGTTTCTAC ACGCCCGAAA CCGTGTCGGC 

951 ACACCGCCGT CTTTCGGACG ACCTCAACGG CGGAGAGGCT TTATCCGCAA 

1001 CACAACGCCT CGAATGTTGG CAAACCCTGC AACAACATCA AAACGGCTGG 

1051 CGGCAAGGCG CGGAGGATTG GAGCCGTTAT CTTTTCGGGC AGCCGTCAGC 

1101 TCCTGAAAAA CTCGCTGCCT TTGTTTCAAA GCATCAAAAA ATACGCTAG 

This corresponds to the amino acid sequence <SEQ ID 190; ORF32-l>: 

1 MNTPPFVCWI FCKVIDNFGD IGVSWRLARV LHRELGWQVH LWTDDVSALR 

51 ALCPDLPDVP CVHQDIHVRT WHSDAADIDT APVPDWIET FACDLPENVL 

101 HIIRRHKPLW LNWEYLSAEE SNERLHLMPS PQEGVQKYFW FMGFSEKSGG 

151 LIRERDYCEA VRFDTEALRE RLMLPEKNAS EWLLFGYRSD VWAKWLEMWR ! 

201 QAGSPMTLLL AGTQIIDSLK QSGVIPQDAL QNDGDVFQTA SVRLVKIPFV 

251 PQQDFDQLLH LADCAVIRGE DSFVRAQLAG KPFFWHIYPQ DENVHLDKLH 

301 AFWDKAHGFY TPETVSAHRR LSDDLNGGEA LSATQRLECW QTLQQHQNGW 

351 RQGAEDWSRY LFGQPSAPEK LAAFVSKHQK IR*w j 

Computer analysis of this amino acid sequence gave the following results: j 
Homology with a predicted ORF from N.menins ritidis (strain A) 

ORF32 shows 93.8% identity over a 81aa overlap with an ORF (ORF32a) from strain A of N. 



meningitidis: 



orf32.pep 
orf32a 



10 20 30 40 50 60 

MNT PPFVCWI FCKVI DNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALC PDLPDVP 
1 1 I I I 1 1 1 I I I I i E I I K I I I 1 S I I K 1 t I I I I I I 1 1 I I I i I I I I I I I M I 1 1 1 I I I 
MNT PPFSAGXFCKVI DNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLPDVX 

10 20 30 40 50 60 



70 80 
or f 32 . pep CVHQDIHVRTWHSDAADI DTA 

I I I I I I I i I I I I I I I I I I t 1 I 
orf32a CVHQDI HVRTWHS DAADI DTAPVXDWIET FACDLPENVLH I IRRHKPLWLXWE YLSAEX 

70 80 90 100 110 120 

The complete length ORF32a nucleotide sequence <SEQ ID 191> is: 

1 ATGAATACTC CTCCTTTTTC TGCTGGANTT TTTTGCAAGG TCATCGACAA 

51 TTTCGGCGAC ATCGGCGTTT CGTGGCGGCT TGCCCGTGTT TTGCACCGCG 

101 AACTCGGTTG GCAGGTGCAT TTGTGGACGG ACGATGTGTC CGCCTTGCGT 

151 GCGCTTTGCC CTGATTTGCC CGATGTTCNC TGCGTTCATC AGGATATTCA 

201 TGTCCGCACT TGGCATTCCG ATGCGGCAGA TATTGATACC GCGCCTGTTC 

251 NCGATGTCGT CATCGAAACT TTTGCCTGCG ACCTGCCCGA AAATGTGCTG 

301 CACATCATCC GCCGACACAA GCCGCTTTGG CTGAANTGGG AATATTTGAG 

351 CGCGGAGGAN AGCAATGAAA GGCTGCACNT GATGCCTTCG CCGCAGGAGA 

401 GTGTTCNAAA ATANTTTTGG TTTATGGGTT TCAGCGAANN NAGCGGCGGA 

451 CTGATACGCG AACGCGATTA CTGCGAAGCC GTCCGTTTCG ATAGCGGAGC 

501 CTTGCGCAAG AGGCTGATGC TTCCCGAAAA AAACGNCCCC GAATGGCTGC 

551 TTTTCGGCTA TCGGAGCGAT GTTTGGGCAA AGTGGCTGGA AATGTGGCGA 

601 CAGGCAGGCA GTCCGTTGAC ACTTTTGCTG GCNGGGGCGC ANATTATCGA 

651 CAGCCTCAAA CAAAACGGCG TTATTCCGCA AGATGCCCTG CAAAACGACG 

701 GCGATGTTTT TCAGACGGCA TCCGTCCGCC TCGTCAAAAT CCCTTTCGTG 

751 CCGCAACAGG ACTTCGACAA ACTGCTGCAC CTTGCCGACT GCGCCGTCAT 
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801 CCGCGGCGAA GACAGTTTCG TGCGCGCCCA GCTTGCGGGC AAACCCTTCT 

851 TTTGGCACAT CTACCCGCAA GATGAGAATG TCCATCTCGA CAAACTCCAC 

901 GCCTTTTGGG ATAAGGCACA CGGTTTCTAC ACGCCCGAAA CCGCATCGGC 

951 ACACCGCCGC CTTTCAGACG ACCTCAACGG CGGAGAGGCT TTATCCGCAA 

5 1001 CACAACGCCT CGAATGTTGG CAAATCCTGC AACAACATCA AAACGGCTGG 

1051 CGGCAAGGCG CGGAGGATTG GAGCCGTTAT CTTTTTGGGC AGCCTTCCGC 

1101 ATCCGAAAAA CTCGCCGCCT TTGTTTCAAA GCATCAAAAA ATACGCTAG 

This encodes a protein having amino acid sequence <SEQ ID 192>: 

1 MNTPPFSAGX FCKVIDNFGD IGVSWRLARV LHRELGWQVH LWTDDVSALR 

10 51 ALCPDLPDVX CVHQDIHVRT WHSDAADIDT APVXDWIET FACDLPENVL 

101 HIIRRHKPLW LXWEYLSAEX SNERLHXMPS PQESVXKXFW FMGFSEXSGG 

151 LIRERDYCEA VRFDSGALRK RLMLPEKNXP EWLLFGYRSD VWAKWLEMWR 

201 QAGSPLTLLL AGAXIIDSLK QNGVIPQDAL QNDGDVFQTA SVRLVKIPFV 

251 PQQDFDKLLH LADCAVIRGE DSFVRAQLAG KPFFWHIYPQ DENVHLDKLH 

15 30i AFWDKAHGFY TPETASAHRR LSDDLNGGEA LSATQRLECW QILQQHQNGW 

351 RQGAEDWSRY LFGQPSASEK LAAFVSKHQK IR* 

ORF32a and ORF32-1 show 93.2% identity in 382 aa overlap: 

10 20 30 40 50 60 

MNTPPFVCWIFCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLPDVP 

iiimi 1 1 1 1 1 1 1 1 1 1 1 1 1 1 m 1 1 1 1 1 1 1 1 ii 1 1 1 1 1 1 1 1 m 1 1 1 1 1 1 1 1 1 1 1 ii 

MNTPPFSAGX FCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLPDVX 
10 20 30 40 50 60 

70 80 90 100 110 120 

CVHQDIHVRTWHSDAADIDTAPVPDWIETFACDLPENVLHIIRRHKPLWLNWEYLSAEE 
I II II I I I I I I II I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 
CVHQDIHVRTWHSDAADIDTAPVXDWIETFACDLPENVLHIIRRHKPLWLXWEYLSAEX 
70 80 90 100 110 120 

130 140 150 160 170 180 

SNERLHLMPSPQEGVQKYFWFMGFSEKSGGLIRERDYCEAVRFDTEALRERLMLPEKNAS 
IIIMI I I II I I: I I Ml I III I I I M M III II I I II I I : 11 ( : I M I M M 
SNERLHXMPSPQESVXBCXFWFMGFSEXSGGLIRERDYCEAVRFDSGALRKRLMLPEKNXP 
130 140 150 160 170 180 

190 200 210 220 230 240 

EWLLFGYRS DVWAKWLEMWRQAGS PMTLLLAGTQI IDSLKQSGVI PQDALQNDGDVFQTA 
I M I M ! II II II M I I I II I M II : I M M M II II I II : II I I II I II M II II I II 
EWLLFGYRS DVWAKWLEMWRQAGSPLTLLLAGAXI IDSLKQNGVI PQDALQNDGDVFQTA 
190 200 210 220 230 240 

250 260 270 280 290 300 

SVRLVKIPFVPQQDFDQLLHLADCAVIRGEDSFVRAQLAGKPFFWHIYPQDENVHLDKLH 
1 1 K 1 1 t M 1 1 1 « f 1 1 1 : 1 1 1 1 [ 1 1 M 1 1 II I I M M K 1 1 1 1 1 1 1 1 K I I I 1 1 1 1 1 1 1 1 1 K I 
SVRLVKIPFVPQQDFDKLLHIJVDCAVIRGEDSFVRAQLAGKPFFWHIYPQDENVHLDKLH 
250 260 270 280 290 300 

310 320 330 340 350 360 

AFWDKAHGFYTPETVSAHRRLSDDLNGGEALSATQRLECWQTLQQHQNGWRQGAEDWSRY 

I M M I II M II II: II II II I II M I I II II II Mi II II I I I I II I I I III I I M M 
AFWDKAHGFYTPETASAHRRLSDDLNGGEALSATQRLECWQILQQHQNGWRQGAEDWSRY 

310 320 330 340 350 360 

370 380 
LFGQPSAPEKLAAFVSKHQKIRX 

II MM I M I M I I 1 1 1 M I It 
LFGQPSASEKLAAFVSKHQKIRX 

370 380 



60 Homoloev with a predicted ORF from N. gonorrhoeae 

ORF32 shows 95.1% identity over a 82aa overlap with a predicted ORF (ORF32.ng) from N. 
gonorrhoeae: 

orf32.pep MNTPPF-VCWIFCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLP 57 

Ml I I M 1 1 I M II I M M II I I II I II I II M M I I II II II I II II II II II I 
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orf32ng MVMNTYAFPVCWIFCKVIDNFGDIGVSWRLARVLJIRELGWQVHLWTDDVSALRALCPDLP 60 

orf32.pep DVPCVHQDIHVRTWHSDAADIDTA 81 

III I I I I I I I I I I I I I I I I I II I 
orf32ng DVPFVHQDIHVRTWHSDAADIDTAPVPDAVIETFACDLPENVLNIIRRHKPLWLNWEYLS 120 

An ORF32ng nucleotide sequence <SEQ ID 193> was predicted to encode a protein having amino 
acid sequence <SEQ ID 194>: 

1 MVMNTYAFPV CWIFCKVIDN PGDIGVSWRL ARVLHRELGW QVHLWTDDVS 

51 ALRALCPDLP DVPFVHQDIH VRTWHSDAAD IDTAPVPDAV IETFACDLPE 

101 NVLNIIRRHK PLWLNWEYLS AEESNERLHL MPSPQEGVQK YFWFMGFSEK 

151 SGGLIRERDY REAVRFDTEA LRRRLVLPEK NAPEWLLFGY RGDVWAKWLD 

201 MWQQAGSLMT LLLAGAQIID SLKQSGVIPQ NALQNEGGVF QTASVRLVKI 

251 PFVPQQDFDK LLHLADCAVI RGEDSFVRTQ LAGKPFFWHI YPQDENVHLD 

301 KLHAFWDKAY GFYTPETASV HRLLSDDLNG GEALSATQRL ECGVL* 

Further sequencing revealed the following DNA sequence <SEQ ID 195>: 

1 . ATGAATACAT ACGCTTTTCC TGTCTGTTGG ATTTTTTGCA AGGTCATCGA 

51 CAATTTCGGC GACATCGGCG TTTCGTGGCG GCTCGCCCGT GTTTTGCACC 

101 GCGAACTCGG TTGGCAGGTG CATTTGTGGA CGGACGACGT GTCCGCCTTG 

151 CGCGCGCTTT GTCCCGATTT GCCCGATGTT CCCTTCGTTC ATCAGGATAT 

201 TCATGTCCGC ACTTGGCATT CCGATGCGGC AGACATTGAT ACCGCGCCCG 

251 TTCCCGATGC CGTTATCGAA ACTTTTGCCT GCGACCTGCC CGAAAATGTG 

301 CTGAACATCA TCCGCCGACA CAAACCGCTT TGGCTGAATT GGGAATATTT 

351 GAGCGCGGAG GAAAGCAATG AAAGGCTGCA CCTGATGCCT TCGCCGCAGG 

401 AGGGCGTTCA AAAATATTTT TGGTTTATGG GTTTCAGCGA AAAAAGCGGC 

451 GGGTTGATAC GCGAACGCGA TTACCGCGAA GCCGTCCGTT TCGATACCGA 

501 AGCCCTGCGC CGGCGGCTGG TGCTGCCCGA AAAAAACGCC CCCGAATGGC 

551 TGCTTTTCGG CTATCGGGGC GATGTTTGGG CAAAGTGGCT GGACATGTGG | 

601 CAACAGGCAG GCAGCCTGAT GACCCTACTG CTGGCGGGGG CGCAAATTAT j 

651 CGACAGCCTC AAACAAAGCG GCGTTATTCC GCAAAACGCC CTGCAAAAtg | 

701 aaggcgGTGT CTTTCagacG gcatccgTcC gccttGTCAA AAtcCCGTTC 

751 GTGCcGCAAC AGGAcTTCGA CAAATTGCTG CAcctcgcCG ACTGCGCCGT 

801 GATACGCGGC GAAGACAGTT TCGTGCGTAC CCAGCTTGCC GGAAAACCCT 

851 TTTTTTGGCA CATCTACCCG CAAGACGAGA ATGTCCATCT CGACAAACTC 

901 CACGCCTTTT GGGATAAGGC ATACGGCTTC TACACGCCCG AAACCGCATC 

951 GGTGCACCGC CTCCTTTCGG ACGACCTCAA CGGCGGAGAG GCTTTATCCG 

1001 CAACACAACG CCTCGAATGT TGGCAAACCC TGCAACAACA TCAAAACGGC 

1051 TGGCGGCAAG GCGCGGAGGA TTGGAGCCGT TATCTTTTCG GGCAGCCTTC 

1101 CGCATCCGAA AAACTCGCCG CCTTTGTTTC AAAGCATCAA AAAATACGCT 

1151 AG 

This encodes a protein having amino acid sequence <SEQ ED 196; ORF32ng-l>: 

1 MNTYAFPVCW IFCKVIDNFG DIGVSWRLAR VLHRELGWQV HLWTDDVSAL 

51 RALCPDLPDV PFVHQDIHVR TWHSDAADID TAPVPDAVIE TFACDLPENV 

101 LNIIRRHKPL WLNWEYLSAE ESNERLHLMP SPQEGVQKYF WFMGFSEKSG 

151 GLIRERDYRE AVRFDTEALR RRLVLPEKNA PEWLLFGYRG DVWAKWLDMW 

201 QQAGSLMTLL LAGAQIIDSL KQSGVIPQNA LQNEGGVFQT ASVRLVKIPF 

251 VPQQDFDKLL HLADCAVIRG EDSFVRTQLA GKPFFWHIYP QDENVHLDKL 

301 HAFWDKAYGF YTPETASVHR LLSDDLNGGE ALSATQRLEC WQTLQQHQNG 

351 WRQGAEDWSR YLFGQPSASE KLAAFVSKHQ KIR* 

ORF32ng-l and ORF32-1 show 93.5% identity in 383 aa overlap: 

10 20 30 40 50 59 

orf 32-1 . pep MNTPPF-VCWIFCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLPDV 
111 I I I I I I I I I I I I I I I I I I !! I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I II I 
orf32ng-l MNTYAFPVCW I FCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLPDV 

10 20 30 40 50 60 

60 70 80 90 100 110 119 

orf 32- 1 . pep PCVHQDIHVRTWHSDAADIDTAPVPDWIETFACDLPENVLHI IRRHKPLWLNWEYLSAE 
I I II I I II I I I I I I I I I I I I I I I I I : I I I I I I I I I I I II I: I I I I I | | || | | | | | | | || 
orf32ng-l PFVHQDIHVRTWHSDAADI DTAPVPDAVIETFACDLPENVLNI IRRHKPLWLNWEYLSAE 

70 80 90 100 110 120 



120 130 140 150 160 170 179 
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orf 32-1 . pep ESNERLHLMPSPQEGVQKYFWFMGFSEKSGGLIRERDYCEAVRFDTEALRERLMLPEKNA 

I Ml II I Ml II UN I Mil Mil Mi Mill II IN III Mill i I 1:11:1 Mill 
orf 32ng- 1 ESNERLHLMPSPQEGVQKYFWFMGFSEKSGGLIRERDYREAVRFDTEALRRRLVLPEKNA 

130 140 150 160 170 180 

180 190 200 210 220 230 239 

orf 32-1 . pep SEWLLFGYRSDVWAKWLEMWRQAGSPMTLLLAGTQI I DSLKQSGVIPQDALQNDGDVFQT 
IMIMM:MIIMl:||:llll I I II M I : I I M II M I I M II : I I II : I MM 
orf32ng-l PEWLLFGYRGDVWAKWLDMWQQAGSLMTLLLAGAQIIDSLKQSGVIPQNALQNEGGVFQT 
10 190 200 210 220 230 240 

240 250 260 270 280 290 299 

orf 32-1 . pep ASVRLVKIPFVPQQDFDQLLHLADCAVIRGEDSFVRAQLAGKPFFWHIYPQDENVHLDKL 
M f I I i ! M I I 1 M I I I : I I I 1 E I I I I I I I I M I M : I I i U ! I I 1 | M I I I M I M I M 
15 orf32ng-l ASVRLWIPFVPQQDFDKLLHIADGAVIRGEDSFVRTQLAGKPFFWHIYPQDENVHLDKL 

250 260 270 280 290 300 

300 310 320 330 340 350 359 

orf 32-1 . pep HAFWDKAHGFYTPETVSAHRRLSDDLNGGEALSATQRLECWQTLQQHQNGWRQGAEDWSR 

20 MM I MM !:|:| I I II i I I I I I I I I I I II I Ml I Ml Mill III I 

orf32ng-l HAFWDKAYGFYTPETASVHRLLSDDLNGGEALSATQRLECWQTLQQHQNGWRQGAEDWSR 

310 320 330 340 350 360 

360 370 380 

25 orf 32-1. pep YLFGQPSAPEKLAAFVSKHQKIRX 

II I II II I I M II I I II I I II M 

orf32ng-l YLFGQPSASEKLAAFVSKHQKIRX ! 

370 380 

30 On this basis, including the RGD sequence in the gonococcal protein, characteristic of adhesin^, 
it is predicted that the proteins from Kmeningitidis and N. gonorrhoeae, and their epitopes, couljl 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF32-1 (42kDa) was cloned in pET and pGex vectors and expressed in Exoli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
35 7A shows the results of affinity purification of the His-fusion protein, and Figure 7B shows the 
results of expression of the GST-fusion in E.coli. Purified His-fusion protein was used to immunise 
mice, whose sera were used for ELISA, giving a positive result. These experiments confirm that 
ORF32-1 is a surface-exposed protein, and that it is a useful immunogen. 

Example 24 

40 The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 197>: 

1 . .TTGTTCCTGC GTGTNAAAGT GGGGCGTTTT TTCAGCAGTC CGGCGACGTG 

51 GTTTCGGGNC AAAGACCCTG TAAATCAGGC GGTGTTGCGG CTGTATNCGG 

101 ACGAGTGGCG GCA.ACTTCG GTACGTTGGA AAATAGNCGC AACGTCGCAC 

151 AGCCTGTGGC TCTGCACGCT GCTCGGAATG CTGGTGTCGG TATTGTTGCT 

45 201 GCTTTTGGTG CGGCAATATA CGTTCAACTG GGAAAGCACG CTGTTGAGCA 

251 ATGCCGCTTC GGTACGCGCG GTGGAAATGT TGGCATGGCT GCCGTCGAAA 

301 CTCGGTTTCC CTGTCCCCGA TGCGCGGTCG GTCATCGAAG GCCGTCTGAA 

351 CGGCAATATT GCCGATGCGC GGGCTTGGTC GGGGCTGCTG GTCGNCAGTA 

401 TCGCCTGCTA NGGCATCCTG CCGCGCCTG. . 

50 This corresponds to the amino acid sequence <SEQ ID 198; ORF33>: 

1 . . LFLRVKVGRF FSSPATWFRX KDPVNQAVLR LYXDEWRXTS VRWKIXATSH 
51 SLWLCTLLGM LVSVLLLLLV RQYTFNWEST LLSNAASVRA VEMLAWLPSK 
101 LGFPVPDARS VIEGRLNGNI ADARAWSGLL VXSIACXGIL PRL. . 
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Further work revealed the complete nucleotide sequence <SEQ ID 199>: 

1 ATGTTGAATC CATCCCGAAA ACTGGTTGAG CTGGTCCGTA TTTTGGACGA 

51 AGGCGGTTTT ATTTTCAGCG GCGATCCCGT ACAGGCGACG GAGGCTTTGC 

101 GCCGCGTGGA CGGCAGTACG GAGGAAAAAA TCATCCGTCG GGCGGAGATG 

151 ATTGACAGGA ACCGTATGCT GCGGGAGACG TTGGAACGTG TGCGTGCGGG 

201 GTCGTTCTGG TTGTGGGTGG TGGCGGCGAC GTTTGCATTT TTTACCGGTT 

251 TTTCAGTCAC TTATCTTCTA ATGGACAATC AGGGTCTGAA TTTCTTTTTG 

301 GTTTTGGCGG GCGTGTTGGG CATGAATACG CTGATGCTGG CAGTATGGTT 

351 GGCAATGTTG TTCCTGCGTG TGAAAGTGGG GCGTTTTTTC AGCAGTCCGG 

401 CGACGTGGTT TCGGGGCAAA GACCCTGTAA ATCAGGCGGT GTTGCGGCTG 

451 TATGCGGACG AGTGGCGGCA ACCTTCGGTA CGTTGGAAAA TAGGCGCAAC 

501 GTCGCACAGC CTGTGGCTCT GCACGCTGCT CGGAATGCTG GTGTCGGTAT 

551 TGTTGCTGCT TTTGGTGCGG CAATATACGT TCAACTGGGA AAGCACGCTG 

601 TTGAGCAATG CCGCTTCGGT ACGCGCGGTG GAAATGTTGG CATGGCTGCC 

651 GTCGAAACTC GGTTTCCCTG TCCCCGATGC GCGGGCGGTC ATCGAAGGCC 

701 GTCTGAACGG CAATATTGCC GATGCGCGGG CTTGGTCGGG GCTGCTGGTC 

751 GGCAGTATCG CCTGCTACGG CATCCTGCCG CGCCTGCTGG CTTGGGTAGT 

801 GTGTAAAATC CTTTTGAAAA CAAGCGAAAA CGGATTGGAT TTGGAAAAGC 

851 CCTATTATCA GGCGGTCATC CGCCGCTGGC AGAACAAAAT CACCGATGCG 

901 GATACGCGTC GGGAAACCGT GTCCGCCGTT TCACCGAAAA TCATCTTGAA 

951 CGATGCGCCG AAATGGGCGG TCATGCTGGA GACCGAGTGG CAGGACGGCG 

1001 AATGGTTCGA GGGCAGGCTG GCGCAGGAAT GGCTGGATAA GGGCGTTGCC 

1051 ACCAATCGGG AACAGGTTGC CGCGCTGGAG ACAGAGCTGA AGCAGAAACC 

1101 GGCGCAACTG CTTATCGGCG TGCGCGCCCA AACTGTGCCG GACCGCGGCG 

1151 TGTTGCGGCA GATTGTCCGA CTCTCGGAAG CGGCGCAGGG CGGCGCGGTG 

1201 GTGCAGCTTT TGGCGGAACA GGGGCTTTCA GACGACCTTT CGGAAAAGCT 

1251 GGAACATTGG CGTAACGCGC TGGCCGAATG CGGCGCGGCG TGGCTTGAGC 

1301 CTGACAGGGC GGCGCAGGAA GGGCGTTTGA AAGACCAATA A 

This corresponds to the amino acid sequence <SEQ ID 200; ORF33-l>: 

1 MLNPSRKLVE LVRILDEGGF IFSGDPVQAT EALRRVDGST EEKIIRRAEM 

51 IDRNRMLRET LERVRAGS FW LWWAATFAF FTGFS VTYLL MDNQGLNFFL 

101 VLAGVLGMNT LMLAVW LAML FLRVKVGRFF SSPATWFRGK DPVNQAVLRL 

151 YADEWRQPSV RWKIGATSHS LW LCTLLGML VSVLLLLLV R QYTFNWESTL 

201 LSNAASVRAV EMLAWLPSKL GFPVPDARAV IEGRLNGNIA DARAWSGLLV 

251 GSIACYGILP RLLA WWCKI LLKTSENGLD LEKPYYQAVI RRWQNKITDA 

301 DTRRETVSAV SPKIILNDAP KWAVMLETEW QDGEWFEGRL AQEWLDKGVA 

351 TNREQVAALE TELKQKPAQL LIGVRAQTVP DRGVLRQIVR LSEAAQGGAV 

401 VQLLAEQGLS DDLSEKLEHW RNALAECGAA WLEPDRAAQE GRLKDQ* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF33 shows 90.9% identity over a 143aa overlap with an ORF (ORF33a) from strain A of AT. 
meningitidis: 



10 20 30 

orf 33 . pep LFLRVKVGRFFS S PATWFRXKD PVNQAVLR 

I I I I I I I II I I I I I I I I I I llllllllli 
orf 33a LMDNQGLN FFLVLAGVXGMNT LMLAVW LAML FLRVKVGRFFS S PATW FRGKD PVNQAVLR 

90 100 110 120 130 140 

40 50 60 70 80 90 

orf 33 . pep LYXDEWRXTSVRWKIXATSHSLW LCTLLGMLVSVLLLLLVR QYTFNWESTLLSNAASVRA 

I! Mill llllll M I M I 1 I I I II II I I I I I I I I M I 11 I M I M ! M : : : : I t t 
orf 33a LYADEWRXPSVRWKIGATSHSLW LCTLLGMLVSVLLLLLV RQYTFNWESTLLGDSSSVRL 
150 160 170 180 190 200 



100 110 120 130 140 

or f 33 . pep VEMLAWLPSKLGFPVPDARSVIEGRLNGNIADARAWSG LLVXSIACXGILPRL 
IMIIIII:IIMIMiil:lllllilllllilMIIIMI Mil llllll 
orf 33a VEMLAWLPAKLGFPVPDARAVIEGRLNGNIADARAWS GLLVGSIACYGILPRLLAW AVCK 
210 220 230 240 250 260 



orf33a 



ILXXTSENGLDLEKXXXXXXIRRWQNKITDADTRRETVSAVSPKIVLNDAPKWAVMLETE 
270 280 290 300 310 320 
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The complete length ORF33a nucleotide sequence <SEQ ID 201> is: 

1 ATGTTGAATC CATCCCGAAA ACTGGTTGAG CTGGTCCGTA TTTTGGAAGA 

51 AGGCGGCTTT ATTTTCAGCG GCGATCCCGT GCAGGCGACG GAGGCTTTGC 

101 GCCGCGTGGA CGGCAGTACG GAGGAAAAAA TCATCCGTCG GGCGAAGATG 

151 ATCGACAGGA ACCGTATGCT GCGGGAGACG TTGGAACGTG TGCGTGCGGG 

201 GTCGTTCTGG TTGTGGGTGG CGGCGGCGAC GTTTGCGTTT NTTACCGNTT 

251 TTTCAGTTAC TTATCTTCTA ATGGACAATC AGGGTCTGAA TTTCTTTTTG 

301 GTTTTGGCGG GCGTGNTGGG CATGAATACG CTGATGCTGG CAGTATGGTT 

, 351 GGCAATGTTG TTCCTGCGCG TGAAAGTGGG GCGTTTTTTC AGCAGTCCGG 

401 CGACGTGGTT TCGGGGCAAA GACCCTGTCA ATCAGGCGGT GTTGCGGCTG 

451 TATGCGGACG AGTGGCGGCN ACCTTCGGTA CGTTGGAAAA TAGGCGCAAC 

501 GTCGCACAGC CTGTGGCTCT GCACGCTGCT CGGAATGCTG GTGTCGGTAT 

551 TGTTGCTGCT TTTGGTGCGG CAATATACGT TCAACTGGGA AAGCACGCTG 

601 TTGGGCGATT CGTCTTCGGT ACGGCTGGTG GAAATGTTGG CATGGCTGCC 

651 TGCGAAACTG GGTTTTCCCG TGCCTGATGC GCGGGCGGTC ATCGAAGGTC 

701 GTCTGAACGG CAATATTGCC GATGCGCGGG CTTGGTCGGG GCTGCTGGTC 

751 GGCAGTATCG CCTGCTACGG CATCCTGCCG CGCCTCTTGG CTTGGGCGGT 

801 ATGCAAAATC CTTNTGNAAA CAAGCGAAAA CGGCTTGGAT TTGGAAAAGC 

851 NCNNNNNTCN NNCGNTCATC CGCCGCTGGC AGAACAAAAT CACCGATGCG 

901 GATACGCGTC GGGAAACCGT GTCCGCCGTT TCGCCGAAAA TCGTCTTGAA 

951 CGATGCGCCG AAATGGGCGG TCATGCTGGA GACCGAATGG CAGGACGGCG 

1001 AATGGTTCGA GGGCAGGCTG GCGCAGGAAT GGCTGGATAA GGGCGTTGCC 

1051 GCCAATCGGG AACAGGTTGC CGCGCTGGAG ACAGAGCTGA AGCAGAAACC 

1101 GGCGCAACTG CTTATCGGCG TGCGCGCCCA AACTGTGCCC GACCGCGGCG 

1151 TGTTGCGGCA GATCGTCCGA CTTTCGGAAG CGGCGCAGGG CGGCGCGGTG 

1201 GTGCANCTTT TGGCGGAACA GGGGCTTTCA GACGACCTTT CGGAAAAGCT 

1251 GGAACATTGG CGTAACGCGC TGACCGAATG CGGCGCGGCG TGGCTGGAAC 

1301 CCGACAGAGC GGCGCAGGAA GGCCGTCTGA AAACCAACGA CCGCACTTGA 

This encodes a protein having amino acid sequence <SEQ ED 202>: 

1 MLNPSRKLVE LVRILEEGGF IFSGDPVQAT EALRRVDGST EEKIIRRAKM 

• 51 I DRNRMLRET LERVRAGS FW LWVAAATFAF XTXFS VTYLL MDNQGLNFFL 

101 VLAGVXGMNT LMLAVW LAML FLRVKVGRFF SSPATWFRGK DPVNQAVLRL 

151 YADEWRXPSV RWKIGATSHS LW LCTLLGML VSVLLLLLVR QYTFNWESTL 

201 LGDSSSVRLV EMLAWLPAKL GFPVPDARAV IEGRLNGNIA DARAWSGLLV 

251 GSIACYGILP RLLAW AVCKI LXXTSENGLD LEKXXXXXXI RRWQNKITDA 

301 DTRRETVSAV SPKIVLNDAP KWAVMLETEW QDGEWFEGRL AQEWLDKGVA 

351 ANREQVAALE TELKQKPAQL LIGVRAQTVP DRGVLRQIVR LSEAAQGGAV 

401 VXLLAEQGLS DDLSEKLEHW RNALTECGAA WLEPDRAAQE GRLKTNDRT* 

ORF33a and ORF33-1 show 94.1% identity in 444 aa overly: 

10 20 30 40 50 60 

orf 33a . pep MLNPSRKLVELVRILEEGGFIFSGDPVQATEALRRVDGSTEEKIIRRAKMI DRNRMLRET 

lilMlllill II III IN j| Ml Ml Mllll III III III ||: | IMIIM M| 
orf 33-1 MLNPSRKLVE LVRILDEGGFI FSGDPVQATEALRRVDGSTEEKI IRRAEMI DRNRMLRET 

10 20 30 40 50 60 

70 80 90 100 110 120 

or f 33a . pep LERVRAGSFWLWAAATFAFXTXFSVTYLLMDNQGI^FFL VLAGVXGMNT LMLAVWLAML 

, lllllllllllll: I I Mllll III II I Mill Ml I Mllll II II 

orf 33-1 LERVRMSFWLWWAATFAFFTGFSVTYLLMDNQGLNFFLVLAGVIiGMNT LMLAVWLAML 

70 80 90 100 110 120 

130 140 150 160 170 180 

or f 33a . pep FLRVKVGRFFS S PATW FRGKDPVNQAVLRL YADEWRXPSVRWKI GAT SHS LWLCTLLGML 

MM! Ml II II I MM MM Mil III II MM II I I I M I M I M I I I 1 1 II I M II 
orf 33-1 FLRVKVGRFFS SPATWFRGKDPVNQAVLRLYADEWRQPSVRWKIGAT SHS LWLCTLLGML 

130 140 150 160 170 180 

190 200 210 220 230 . 240 

orf 33a . pep VSVLLLLLVRQYT FNWEST LLGDS S S VRLVEMLAWLPAKLGFPVPDARAVIEGRLNGN IA 

Mill Ml Ml Mill:::: III I I II I I I I : I I I I II II I II I I Ml 

orf 33-1 VSVLLLLLVRQYT FNWESTLLSNAASVRAVEMLAWLPSKLGFPVPDARAVIEGRLNGNIA 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 33a. pep DARAWSGLLVG SI AC YGILPRLLAWAVCKILXXTSENGLDLEKXXXXXX I RRWQNKITDA 

M I I M I I I I I I I I I I I II I I M I I : I I I 1 I MINIUM IMIIMIMI 
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orf33-l DARAWSGLLVGSIACYGILPRLLAWWCKILLKTSENGLD^ 

310 320 330 340 350 360 

DTRRETVSAVSPKIVLNDAPKWAVMLETEWQDGEWFEGRLAQEWLDKGVAANREQVAALE 

I I I | | | 1 1 1 ! 1 I I I = I 1 1 1 I I 1 1 1 I 1 1 1 1 I I 1 1 I ■ ■ 1 I > I I I t I 1 1 1 I 1 1 * 1 1 I 1 1 I II I 
DTRRET VS AVS PKI I LNDAPKWAVMLETEWQDGEWFEGRLAQEWLDKGVATNREQVAALE 
ortJJ 310 320 330 340 350 360 

370 380 390 400 410 420 

TELKQKPAQLLIGVRAQTVPDRGVLRQIVRLSEAAQGGAWXLLAEQGLSDDLSEKLEHW 

i it in in mi mi Minim 1 1 nun in inn mim m 'i mi mil 

^€11-y TELKQKPAQLLIGVRAQTVPDRGVLRQIVRLSEAAQGGAWQLLAEQGLSDDLSEKLEHW 
oriJ 370 380 390 400 410 420 



orf33a.pep 



orf 33a. pep 



430 440 450 

orf33a Pep RNALTECGAAWLEPDRAAQEGRLKTNDRTX 

I 1 M : i I I I i M M I I I I M 1 M I 
or f 3 3- 1 RNALAECGAAWLE PDRAAQEGRLKDQX 

430 440 



Mnmolo gv with a predicted ORF from ^gonorrhoeae 

ORF33 shows 91.6% identity over a 143aa overlap with a predicted ORF (ORF33.ng) from K 



gonorrhoeae: 

orf 33. pep 
orf 33ng 
orf 33. pep 



LFLRVKVGRFFS SPATWFRXKDPVNQAVLR , 30 

I I M 1 1 M II II i M M I I 1 M II II I I 
I^DNQGLNFFLVLAGVLGMNTLMLAVWLATLFLRVKVGRFFSSPATWFRGKGPVNQAVLR 

LYXDEWRXTSVRWKIXATSHSLWLCTLLGMLVSVLLLLLVRQYTFNWESTLLSNAASVRA 

ii t :i i in ii i iiMMM ii ii minimum mm mi mini \\ 



100 
90 



orf33ng LYADQWRQPSVRWKIGATAHSLWLCTLLGMLVSVLLLLLVRQYTFNWESTLLSNAASVRA 160 

143 



orf 33. pep 



VEMLAWLPSKLGFPVPDARSVIEGRLNGNIADARAWSGLLVXSIACXGILPRL 

i I I I I I I I I I I I I I I I i I I : N I M II I M I I I II M M M IMI MMII 
VEMLAWLPSKLGFPVPDARAVIEGRLNGNIADARAWSGLLVGSIVCYGILPRLLAWVVCK 



220 



orf33ng 

An ORF33ng nucleotide sequence <SEQ ID 203> was predicted to encode a protein having amino 
acid sequence <SEQ ID 204>: 

1 MIDRDRMLRD TLERVRAGS F WLWVWASMM FTAGFS GTYL LMDNQGLNFF 

51 LVLAGVL GMN TLMLAVW LAT LFLRVKVGRF FSSPATWFRG KGPVNQAVLR 

101 LYADQWRQPS VRWKIGATAH SL WLCTLLGM LVSVLLLLLV RQYTFNWEST 

151 LLSNAA SVRA VEMLA WLPSK LGFPVPDARA VIEGRLNGNI ADARAWSGLL 

201 VGSIVCYGIL PRLLAWWCK ILLKTSENGL DLEKTYYQAV IRRWQNKITD 

251 ADTRRETVSA VSPKIVLNDA PKWALMLETE WQDGQWFEGR LAQEWLDKGV 

301 AANREQVAAL ETELKQKPAQ LLIGVRAQTV PDRGVLRQIV RLSEAAQGGA 

351 WQLLAEQGL SDDLSEKLEH WRNALTECGA AWLEPDRVAQ EGRLKDQ* 

Further sequence analysis revealed the following DNA sequence <SEQ ID 205>: 

1 ATGTTGaatC CATCCCgaAA ACTGgttgag ctGgTCCgtA Ttttgaataa 

51 agggggtTTT attttcagcg gcgatcctgt gcaggcgacg gaggctttgc 

101 gccgcgtgga cggcAGTACG GAggAaaaaa tcttccgtcg GGCGGAGAtg 

151 atcgACAGGg accgtatgtt gcgggACaCg TtggaacGTG TGCGTGCggg 

201 qtcgtTctgG TTATGGGTGG TggtggCAtC gATGATGTtt aCCGCCGGAT 

251 TTTCAGgcac ttatCttCTG ATGGACaatC AGGGGCtGAA TtTCTTTTTA 

301 GTTTTggcgG GAGTGTtggG CATGaatacG ctgATGCTGG CAGTATGGtt 

351 gGCAACGTTG TTCCTGCGCG TGAAAGTGGG ACGGTTTTTC AGCAGTCCGG 

401 CGACGTGGTT TCGGGGCAAA GGCCCTGTAA ATCAGGCGGT GTTGCGGCTG 

451 TATGCGGACC AGTGGCGGCA ACCTTCGGTA CGATGGAAAA TAGGCGCAAC 

501 GGCGCACAGC TTGTGGCTCT GCACGCTGCT CGGAATGCTG GTGTCGGTAT 

551 TGCTGCTGCT TTTGGTGCGG CAATATACGT TCAACTGGGA AAGCACGCTG 

601 TTGAGCAATG CCGCTTCGGT ACGCGCGGTG GAAATGTTGG CATGGCTGCC 

651 GTCGAAACTC GGTTTCCCTG TCCCCGATGC GCGGGCGGTC ATCGAAGGTC 

701 GTCTGAACGG CAATATTGCC GATGCGCGGG CTTGGTCGGG GCTGCTGGTC 

751 GGCAGTATCG TCTGCTACGG CATCCTGCCG CGCCTCTTGG CTTGGGTAGT 
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801 GTGTAAAATC CTTTTGAAAA CAAGCGAAAA CGGattgGAT TTGGAAAAAA 

851 CCTATTATCA GGCGGTCATC CGCCGCTGGC AGAACAAAAT CACCGATGCG 

901 GATACGCGTC GGGAAACCGT GTCCGCCGTT TCGCcgaAAA TCGTCTTGAA 

951 CGATGCGCCG AAATGGGCGC TCATGCTGGA GACCGAGTGG CAGGACGGCC 

1001 AATGGTTCGA GGGCAGGCTG GCGCAGGAAT GGCTGGATAA GGGCGTTGCC 

1051 GCCAATCGGG AACAGGTTGC CGCGCTGGAG ACAGAGCTGA AGCAGAAACC 

1101 GGCGCAACTG CTTATCGGCG TACGCGCCCA AACTGTGCCG GACCGGGGCG 

1151 TGCTGCGGCA GATTGTGCGG CTTTCGGAAG CGGCGCAGGG CGGCGCGGTG 

1201 GTGCAGCTTT TGGCGGAACA GGGGCTTTCA GACGACCTTT CGGAAAAGCT 

1251 GGAACATTGG CGTAACGCGC TGACCGAATG CGGCGCGGCG TGGCTTGAGC 

1301 CTGACAGGGT GGCGCAGGAA GGCCGTTTGA AAGACCAATA A 

This encodes a protein having amino acid sequence <SEQ ID 206; ORF33ng-l>: 

1 MLNPSRKLVE LVRILNKGGF IFSGDPVQAT EALRRVDGST EEKIFRRAEM 

51 I DRDRMLRDT LERVRAGS FW LWVWASMMF TAGFSG TYLL MDNQGLNFFL 

101 VLAGVLGMNT LMLAVW LATL FLRVKVGRFF SSPATWFRGK GPVNQAVLRL 

151 YADQWRQPSV RWKIGATAHS L WLCTLLGML VSVLLLLLV R QYTFNWESTL 

201 LSNAASVRAV EMLAWLPSKL GFPVPDARAV IEGRLNGNIA DARAWSGLLV 

251 GSIVCYGILP RLLAW WCKI LLKTSENGLD LEKTYYQAVI RRWQNKITDA 

301 DTRRETVSAV SPKIVLNDAP KWALMLETEW QDGQWFEGRL AQEWLDKGVA 

351 ANREQVAALE TELKQKPAQL LIGVRAQTVP DRGVLRQIVR LSEAAQGGAV 

401 VQLLAEQGLS DDLSEKLEHW RNALTECGAA WLEPDRVAQE GRLKDQ* 

ORF33ng- 1 and ORF33-1 show 94.6% identity in 446 aa overlap: 

10 20 30 40 50 60 

orf 33-1 -pep MLNPSRKLVELVRILDEGGFIFSGDPVQATEALRRVDGSTEEKIIRRAEMIDRNRMLRET 
M M I I I I I I I I I I I :: I I I I I I ! i I I I M I I I I I i I I I M I 1 I : I I I I i I I t : t t I I : I 
orf33ng-l MLNPSRKLVELVRILNKGGFIFSGDPVQATEALRRVDGSTEEKIFRRAEMI DRDRMLRDT 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 33-1. pep LERVRAGS FWLWWAAT FAFFTGFSVTYLLMDNQGLNFFLVLAGVLGMNTLMLAVWLAML 
I 1 1 I 1 1 1 1 1 1 1 1 1 1 : I : : I : I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I | 
orf33ng-l LERVRAGSFWLWVWASMMFTAGFSGTYLLMDNQGLNFFLVLAGVLGMNTLMLAVWLATL 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 33-1 . pep FLRVKVGRFFSSPATWFRGKDPVNQAVLRLYADEWRQPSVRWKIGATSHSLWLCTLLGML 
I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I : II I I I i I I I I I I 
orf33ng-l FLRVECVGRFFSSPATWFRGKGPVNQAVLRLYADQWRQPSVRWKIGATAHSLWLCTLLGML 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 33-1 . pep VSVLLLLLVRQYTFNWESTLLSNAASVRAVEMLAWLPSKLGFPVPDARAVIEGRLNGNIA 

I I II 1 1 III 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 It 1 1 1 1 1 1 1 1 1 Ml 1 1 1 1 1 1 1 1 1 1 II 1 1 1 II 1 1 
or f 3 3ng- 1 VSVLLLLLVRQYT FNWESTLLSNAASVRAVEMLAWLPSKLGFPVPDARAVIEGRLNGNIA 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 33-1 . pep DARAWSGLLVGSIACYGILPRLLAWWCKILLKTSENGLDLEKPYYQAVIRRWQNKITDA 
1 I I I 1 I I 1 I t 1 1 I x I I 1 I | | I | | | | | I 1 | | 1 | 1 | | J | t f 1 I I I I 1 I ! I I 1 1 1 I 1 I I 1 i I 
orf33ng-l DARAWSGLLVGSIVCYGILPRLLAWWCKILLKTSENGLDLEKTYYQAVIRRWQNKITDA 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 33-1 . pep DTRRETVSAVS PKI ILNDAPKWAVMLETEWQDGEWFEGRLAQEWLDKGVATNREQVAALE 
I ( 1 1 1 I 1 1 1 1 1 1 II : I 1 1 1 1 1 1 1 : 1 I 1 1 1 1 1 1 I : I 1 1 1 I 1 1 1 1 I 1 1 II I h 1 1 1 I 1 1 1 I I 
orf33ng-l DTRRETVSAVS PKI VLNDAPKWALMLETEWQDGQWFEGRLAQEWLDKGVAANREQVAALE 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 33-1 - pep TEIiKQKPAQLLIGVRAQTVPDRGVI^QIVRLSEAAQGGAWQLLAEQGLS DDLSEKLEHW 
I M I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I 
orf33ng-l TELKQKPAQLLIGVRAQTVPDRGVLRQIVRLSEAAQGGAWQLLAEQGLSDDLSEKLEHW 

370 380 390 400 410 420 

430 440 
orf 33-1. pep RNALAECGAAWLE PDRAAQEGRLKDQX 
1111:11 I I (I I I i I I : I I | I | | { | II 
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or f 3 3ng - 1 RNALTECGAAWLEPDRVAQEGRLKDQX 
~ 3 430 440 

Based on the presence of several putative transmembrane domains in the gonococcal protein, it is 
predicted that the proteins from ^meningitidis and ^gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 25 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 207>: 

1 CAGAAGAGTT TGTCGAGAAT TTCTTTATGG GGTTTGGGCG GCGTGTTTTT 

51 CGGGGTGTCC GGTCTGGTAT GGTTTTCTTT GGGCGTTTCT TT . GAGTGCG 

101 CCTGTTTTTC GGGTGTTTCT TTTCGGGGTT CGGGACGGGG GACGTTTGTG 

151 GGCAGTACGG GGGTTTCTTT GAGTGTGTTT TCAGCTTGTG TTCC . GGCGT 

201 CGTCCGGCTG CCTGTCGGTT TGAGCTGTGT CGGCAGGTTG CG. .GTTTGA 

251 CCCGGTTTTT CTTGGGTGCG GCAGGGGACG TCATTCTCCT GCCGCTTTCG 

301 TCTGTGCCGT CCGGCTGTGC GGGTTCGGAT GAGGCGGCGT GGTGGTGTTC 

351 GGGTTGGGCG GCATCTTGTJ CCGACTACGC CGTTTGGCAG CCAGAATTCG 

401 GTTTCGCGGG GGCTGTCGGT GTGTTGCGGT TCGGCTTGAA GGGTTTTGTC 

451 GTCC. . , 

This corresponds to the amino acid sequence <SEQ ID 208; ORF34>: 

1 QKSLSRISLW GLGGVFFGVS GLVWFSLGVS XECACFSGVS FRGSGRGTFV 

51 "gstgvslsvf SACVXGWRL PVGLSCVGRL XXLTRFFLGA AGDVILLPLS I 

101 SVPSGCAGSD EAAWWCSGWA ASCPTTPFGS QNSVSRGLSV CCGSA*RVLS j 

151 S.. I 

Further work revealed the complete nucleotide sequence <SEQ ID 209>: 

1 ATGATGATGC CGTTCATAAT GCTTCCTTGG ATTGCkGGTG TGCCTGCCGT 

51 GCCGGGTCAG AATAGGTTGT CCAGAATTTC TTTATGGGGT TTGGGCGGCG 

101 TGTTTTTCGG GGTGTCCGGT TTGGTATGGT TTTCTTTGGG CGTTTCTTTG 

151 GGCTGCGCCT GTTTTTCGGG TGTTTCTTTT CGGGGTTCGG GACGGGGGAC 

201 GTTTGTGGGC AGTACGGGGG TTTCTTTGAG TGTGTTTTCA GCTTGTGTTC 

251 CGGCGTCGTC CGGCTGCCTG TCGGTTTGAG CTGTGTCGGC AGGTTGCGGT 

301 TTGACCCGGT TTTTCTTGGG TGCGGCAGGG GACGGCAGTC CGCTGCCGCT 

351 TTCGTCTGTG CCGTCCGGCT GTGCGGGTTC GGATGAGGCG GCGTGGTGGT 

401 GTTCGGGTTG GGCGGCATCT TGTCCGACTA CGCCGTTTGG CAGCCAGAAT 

451 TCGGTTTCGC GGGGGCTGTC GGTGTGTTGC GGTTCGGCTT GAAGGGTTTT 

501 GTCGCCGTTC GGGTTGAATG TGCTGACGAT GCCTATTGCC AATGCGCCGA 

551 TGGCGGCGAT ACAGATGAGC AATACGGCGC GTATCAGGAG TTTGGGGGTC 

601 AGCCTGAAGG GTTTGTTCGG TTTTTTTGCC ATTTTGATTG TGCTTTTGGG 

651 GTGTCGGGCA ATGCCGTCTG AAGGCGGTTC AGACGGCATT GCCGAGTCAG 

701 CGTTGGACGT AGTTTTGGTA GAGGGTGATG ACTTTTTGTA CGCCGACGGT 

751 GGTGCTGACT TTTTGGGTAA TCTGCGCCTG TTCTTCGGGG GTGAGGATGC 

801 CCATAACGTA GGTTACGTTG CCGTAGGTAA CGATTTTGAC GCGCGCCTGT 

851 GTGGCGGGGC TGATGCCCAA CAGCGTGGCG CGGACTTTGG ATGTGTTCCA 

901 AGTGTCGCCG GCGATGTCGC CGGCAGTGCG CGGCAGGGAG GCGACGGTAA 

951 TATAGTTGTA CACGCCTTCG GCGGCCTGTT CGGAACGTGC AATCTGACCG 

1001 ACGAACTGTT TTTCGCCTTC GGTGGCGACT TGTCCGAGCA GCAGCAGGTG 

1051 GCGGTTGTAG CCGACGACGG AGATTTGGGG CGTGTAGCCT TTGGTTTGGT 

1101 TGTTTTGGCG CAGATAGGAA CGGGCGGTGG TTTCGATACG CAACGCCATA 

1151 ACGTTGTCGT CGGTTTGCGC GCCGGTGGTT CGGCGGTCGA CGGCGGATTT 

1201 CGCGCCGACG GCGGCGCTTC CGATTACTGC GCTGACGCAG CCGCTAAGGG 

1251 CAAGGCTGAA AATGGCGGCA ATCAGGGTGC GGACGGTGTG CGGTTTGGGT 

1301 TTCATCGGGT GCTTCCTTTC TTGGGCGTTT CAGACGGCAT TGCTTTGCGC 

1351 CATGCCGTCT GA 

This corresponds to the amino acid sequence <SEQ ID 210; ORF34-l>: 

1 M MMPFIMLPW IAGVPAV PGQ NRLS RISLWG LGGVFFGVSG LVW FSLGVSL 

51 GCACFSGVSF RGSGRGTFVG STGVSLSVFS ACVPASSGCL SV*AVSAGCG 

101 LTRFFLGAAG DGSPLPLSSV PSGCAGSDEA AWWCSGWAAS CPTTPFGSQN 

151 SVSRGLSVCC GSA*RVLSPF GLNVLTMPIA NAPMAAIQMS NTARIRSLGV 
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201 SLKGLFGFFA ILIVXLGCRA MPSEGGSDGI AESALDWLV EGDDFLYADG 
251 GADFLGNLRL FFGGEDAHNV GYVAVGNDFD ARLCGGADAQ QRGADFGCVP 
301 SVAGDVAGSA RQGGDGNIW HAFGGLFGTC NLTDELFFAF GGDLSEQQQV 
. 351 AWADDGDLG R VAFGLWLA QIGTGGGF DT QRHNVWGLR AGGSAVDGGF 
401 RADGGASDYC ADAAAKGKAE NGGNQGADGV RFGFHRVLPF LGVSDGIALR 
451 HAV* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.meninzitidis (strain A) 

ORF34 shows 73.3% identity over a 161aa overlap with an ORF (ORF34a) from strain A of N. 



meningitidis: 



or f 3 4. pep 
orf34a 

or f 3 4. pep 
orf34a 

orf 34 .pep 
orf34a 



10 20 30 

QKSLSR ISLWGLGGVFFGVSGLVW FSLG VSXE- 
I I 



-CAC 



I I i I I I I I I I I I I I I 1 I I I I I I I I I I Ml 
MMXPXIMLPWIAGVPA VPGQKRLS RXSLWGLGGXFFGVSGLVW FSL GVSXSLGVSXGCAC 
10 20 30 40 50 60 

40 50 60 70 80 90 

FSGV SFRGSGRG TFVGSTGVSLSVFSACVX GVVRLPVGLSCVGRLXX LTRFFLGA 

I I I I I II I I I I I I I I I I I I I I I It I I I I : I:: : I : : 111 I II 

FSGV SFRGSGRG TFVGSTGVSLSVFSACA -: PASSGCLSVXAVSAGCGLTRXFXGA 

70 80 90 100 110 

100 110 120 130 140 150 

AGDVILLPLSSVPSGCAGSDEAAWWCSGWAASCPTTPFGSQNSVSRGLSVCCGSAXRVLS 
Ml II I II I I I I I I I : I I I M M II M I II Ml M II II II II II II I : II II 
AGDGSPLPLSSVPSGCAGADEEAXXCSGWAASCPTTPFGSQNSVSRGLSVCCGSVWRVLS 
120 130 140 150 160 170 



orf 34 .pep 
orf34a 



PFGXNVLTMPIANAPMAVIQMSNTARIRSL GVSLKGLFXFFAILIVLL GCRAMPSEGGSD 
180 190 200 210 220 230 



The complete length ORF34a nucleotide sequence <SEQ ID 21 1> is: 



1 


ATGATGATNC 


51 


GCCGGGTCAG 


101 


TGTTTTTCGG 


151 


TCTTTGGGTG 


201 


GGGTTCGGGA 


251 


TGTTTTCAGC 


301 


GTGTCGGCAG 


351 


CGGCAGTCCG 


401 


ATGAGGAGGC 


451 


CCGTTTGGCA 


501 


TTCGGTNTGG 


551 


CTATTGCCAA 


601 


ATCAGGAGTT 


651 


TTTGATTGTG 


701 


ACGGCATTGC 


751 


TTTTTGTACG 


801 


CTTCGGGGGT 


851 


ATTTTGACGC 


901 


GACTTTGGAT 


951 


GCAGGGAGGC 


1001 


GAACGTGCAA 


1051 


TCCGAGCAGC 


1101 


TGTANCCTTT 


1151 


TCGATACGCA 


1201 


GCGGTCGACG 


1251 


TGACGCAGCC 


1301 


ACGGTGTGCG 


1351 


GACGGCATTG 



CGTTNATAAT 
AAGAGGTTGT 
GGTGTCCGGT 
TTTCTNTGGG 
CGGGGGACGT 
TTGTGCTCCG 
GTTGCGGTTT 
CTGCCGCTTT 
GTNGTNGTGT 
GCCAGAATTC 
AGGGTTTTGT 
TGCGCCGATG 
TGGGGGTCAG 
CTTTTGGGGT 
CGAGTCAGCG 
CCGACGGTGG 
GAGGATGCCC 
GCGCCTGTGT 
GTGTTCCAAG 
GACGGTAATG 
TCTGACCGAC 
AGCAGGTGGC 
GGTTTGGTTG 
GCGCCATTAC 
GCGGATTTCG 
GCCGAGGGCA 
GTTTGGGTTT 
CTTTGCGCCA 



GCTTCCTTGG 
CGAGAANTTC 
TTGGTATGGT 
CTGTGCCTGT 
TTGTGGGCAG 
GCGTCGTCCG 
GACCCGGNTT 
CGTCTGTGCC 
TCGGGTTGGG 
GGTTTCGCGG 
CNCCGTTCGG 
GCGGTGATAC 
CCTGAAGGGT 
GTCGGGCAAT 
TTGGACGTAG 
TGCTGACTTT 
ATAACGTAGG 
GGCGGGGCTG 
TGTCGCCGGC 
TANTTGTACA 
GAACTGTTTC 
GGTTGTAGCC 
TTTTGGCGCA 
GTTGTCGTCG 
CGCCGAGCGC 
AGGCTGAGGA 
CATCGGGTGC 
TGCCGTCTGA 



ATTGCGGGTG 
TTTATGGGGT 
TTTCTTTGGG 
TTTTCGGGTG 
TACNGGGGTT 
GCTGCCTGTC 
TTCTTNGGTG 
GTCCGGCTGT 
CGGCATCTTG 
GGGCTGTCGG 
GTNGAATGTG 
AGATGAGCAA 
TTGTTCNGTT 
GCCGTCTGAA 
TTTNGGTAGA 
TTGGGTAATC 
TTACGTTGCC 
ATGCCCAACA 
GATGTCGCCG 
CGCCTTCGGC 
TCGCCTTCGG 
GACAACGGAG 
GATAGGAGCG 
GTTNGCGCGC 
CGCGCCGCCG 
CGGCGGCAGT 
TTCCTTTCTT 



TGCCTGCCGT 
TTAGGCGGCN 
CGTTTCTNTT 
TTTCTTTTCG 
TCTTTGAGTG 
GGTTTNAGCT 
CGGCAGGGGA 
GCGGGTGCGG 
TCCGACTACG 
TGTGTTGCGG 
CTGACGATGC 
TACGGCGCGT 
TTTTTGCCAT 
GGCGGTTCAG 
GGGTGATGAC 
TGCGCCTGTT 
GTAGGTAACG 
GCGTGGCGCG 
GCAGTGCGCG 
GGCCTGTTCG 
TGGCGACTTG 
ATTTGGGGCG 
GGCGGTGGTT 
CGGTGGTTCG 
ACGACTGCGC 
CAGGGTGCGG 
GGGCGTTTCA 
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This encodes a protein having amino acid sequence <SEQ ID 212>: 

1 MMXPXIMLPW IAGVPA VPGQ KRLS RXSLWG LGGXFFGVSG LVW FSLGVSX 

51 SLGVSXGCAC FSGV SFRGSG RG TFVGSTGV SLSVFSACA P ASSGCLSVXA 

101 VSAGCGLTRX FXGAAGDGSP LPLSSVPSGC AGADEEAXXC SGWAASCPTT 

151 PFGSQNSVSR GLSVCCGSVW RVLSPFGXNV LTMPIANAPM AVIQMSNTAR 

201 IRSL GVSLKG LFXFFAILIV LLG CRAMPSE GGSDGIAESA LDWXVEGDD 

251 FLYADGGADF LGNLRLFFGG EDAHNVGYVA VGNDFDARLC GGADAQQRGA 

301 DFGCVPSVAG DVAGSARQGG DGNVXVHAFG GLFGTCNLTD ELFLAFGGDL 

351 SEQQQVAWA DNGDLG RVXF GLWLAQIGA GGGF DTQRHY WVGXRAGGS 

401 AVDGGFRADR RAADDCADAA AEGKAEDGGS QGADGVRFGF HRVLPFLGVS 

451 DGIALRHAV* 

ORF34a and ORF34-1 show 91.3% identity in 459 aa overlap: 



10 20 30 40 50 60 
or f 34a . pep MMXPXIMLPWIAGVPAVPGQKRLSRXSLWGLGGXFFGVSGLVWFSLGVSXSLGVSXGCAC 
II I MllllllllllllhlMI 1 I I 1 I i 1 MINIMUM Ml Mil 
orf34-l MMMPFIMLPWIAGVPAVPGQNRLSRISLWGLGGVFFGVSGLVWFSLGVSL GCAC 

10 20 30 40 50 



70 80 90 100 110 120 

orf 34a . pep FSGVSFRGSGRGTFVGSTGVSLSVFSACAPASSGCLSVXAVSAGCGLTRX FXGAAGDGSP 

I I I t f I ! t I I 1 1 I 1 I I I 1 1 1 1 1 I 1 I t I J i I 1 1 1 1 ( f I I I E I M ! I I E I 1 I I 1 1 1 1 I I I 
orf 34-1 FSGVSFRGSGRGTFVGSTGVSLSVFSACVPASSGCLSVXAVSAGCGLTRFFLGAAGDGSP 
60 70 80 90 100 110 



130 140 150 160 170 180 

or f 34a . pep LPLSSVPSGCAGADEEAXXCSGWAASCPTTPFGSQNSVSRGLSVCCGSVWRVLSPFGXNV 
I 1 I I I I I I I i I 1 * I I 1 I I M I I I I I M I I I II I M I I I I M I M I : lllllll II 
or f 3 4 - 1 LPLS SVPSGCAGS DEAAWWCSGWAASCPTTPFGSQNS VSRGLSVCCGSAXRVLS PFGLNV 

120 130 140 150 160 170 

190 200 210 220 230 240 

or f 34a . pep LTMPIANAPMAVIQMSNTARIRSLGVSLKGLFXFFAILIVLLGCRAMPSEGGSDGIAESA 
I I I I I I I I I I I : I I I I I II I I I I I I I I I I II I I I I I I II I I I I I I I I I I II I I II I I I I 
orf 34-1 LTMPIANAPMAAIQMSNTARIRSLGVSLKGLFGFFAILIVLLGCRAMPSEGGSDGIAESA 
180 190 200 210 220 230 



250 260 270 280 290 300 

orf 34a . pep LDVVXVEGDDFLYADGGADFLGNLRLFFGGEDAHNVGYVAVGNDFDARLCGGADAQQRGA 
I I I I I I I I I I I I M M M I I I ! I I I M I I I M I I I I I I I I II I M I I I M II I I I M I I 
orf 34-1 LDWLVEGDDFLYADGGADFLGNLRLFFGGEDAHNVGYVAVGNDFDARLCGGADAQQRGA 
240 250 260 270 280 290 



310 320 330 340 350 360 

orf 34a . pep DFGCVPSVAGDVAGSARQGGDGNVXVHAFGGLFGTCNLTDELFLAFGGDLSEQQQVAWA 
I I I I I I I I I I I I M I I M I I I I I : I i t 1 1 1 I I 1 t t I 1 I I I I i r 1 | i | f I | I | 1 1 | I | | | 
orf 34-1 DFGCVPSVAGDVAGSARQGGDGNIWHAFGGLFGTCNLTDELFFAFGGDLSEQQQVAWA 
300 310 320 330 340 350 

370 380 390 400 410 420 

orf 34a . pep DNGDLGRVXFGL WLAQIGAGGGFDTQRH YVWGXRAGG S AVDGG FRADRRAADDCADAA 

I : I I I I I I I I II I I I I II : I I I I I I II I I I I I ! I I I I I I I M I I I I Ml I I I I I 
or f 3 4 - 1 DDGDLGRVAFGLWLAQIGTGGGFDTQRHNWVGLRAGGSAVDGGFRADGGASDYCADAA 
360 370 380 390 400 410 

430 440 450 460 

orf 34a . pep AEGKAE DGGS QGADGVRFG FHRVLP FLGVS DGI ALRHAVX 

I : I I I I : I I : I I I I I I I I I M M I I I i I I I I I I I I M I II 
or f 3 4 - 1 AKGKAENGGNQGADGVRFG FHRVLP FLGVS DGI ALRHAVX 

420 430 440 450 

Homology with a predicted ORF from N.zonorrhoeae 

ORF34 shows 77.6% identity over a 161aa overlap with a predicted ORF (ORF34.ng) from K 
gonorrhoeae: 

orf 34 .pep QKSLSRI SLWGLGGVFFGVSGLVWFSLGVSXE CAC 35 
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orf 34ng 
orf 34 .pep 
orf 34ng 
orf 34 .pep 
orf 34ng 
orf 34 .pep 



II I I I I I II II : I I I I I I I I I I I I I I I M III 
MMMPFIMLPWIAGVPAVPGQKRLSRISLWGLAGVFFGVSGLVWFSLGVSFSLGVSLGCAC 



60 



FSGVSFRGSGRGTFVGSTGVSLSVFSACVXGWRLPVGLSCV GRLXXLTRFFLGA 90 

M I I I t I I I f I : I I I I I It 1 I I I I I I I I : I I : I : I I IN Mill 

FSGVSFRGSGWGAFVGSTGVSLSVFSACVP VPVNESAARAASEGR— GLTRFFLGA 114 

AGDVILLPLSSVPSGCAGSDEAAWWCSGWAASCPTTPFGSQNSVSRGLSVCCGSAXRVLS 150 

Ml lllll I II II II II III 1111:111 III I II I II I III! I: II I I 

AGDGSPLPLSSVPSGCAGSDEAAWWCSGWAASCPTAPFGSQNSVSRGLSVCCGSVWRVLS 174 

S 175 



orf34ng PFGLNVLTMPTANAPMAVIQMSNTARIRS LGVS LKGLFGFFAI L I VLLGCRAMPS EGGS D 234 

The complete length ORF34ng nucleotide sequence <SEQ ID 213> is: 



ATGATGATGC CGTTCATAAT GCTTCCTTGG ATTGCGGGTG TGCCTGCCGT 
GCCGGGTCAA AAGAGGTTGT CGAGAATCTC TTTATGGGGT TTGGCCGGCG 
TGTTTTTCGG GGTGTCCGGT TTGGTATGGT TTTCTTTGGG CGTTTCTTTT 
TCTTTGGGTG TTTCTTTGGG CTGCGCCTGT TTTTCGGGTG TTTCTTTTCG 
GGGTTCGGGA TGGGGGGCGT TTGTGGGCAG TACGGGGGTT TCTTTGAGTG 
TGTTTTCAGC TTGTGTTCCG GTGCCGGTTA ACGAATCGGC TGCCCGGGCC 
GCATCCGAAG GGCGCGGTTT gACCCGGTTT TTCTTGGGTG CGGCAGGGGA 
CGGCAGTCCG CTGCCGCTTT CTTCTGTGCC GTCCGGCTGT GCGGGTTCGG 
ATGAGGCGGC GTGGTGGTGT TCGGGTTGGG CGGCATCTTG TCCGACGGCG , 
CCGTTTGGCA GCCAGAATTC GGTTTCGCGG GGGCTGTCGG TGTGTTGCGG 
TTCGGTTTGG AGGGTTTTGT CGCCGTTCGG GTTGAATGTG CTGACGATGC 
CTACTGCCAA TGCGCCGATG GCGGTGATAC AGATGAGCAA TACGGCGCGT 
ATCAGGAGTT TGGGGGTCAG CCTGAAGGGT TTGTTCGGTT TTTTTGCCAT 
TTTGATTGTG CTTTTGGGGT GTCGGGCAAT GCCGTCTGAA GGCGGTTCAG 
ACGGCATTGC CGAGTCAGCG TTGGACGTAG TTTTGGTAGA GGGTAATGAC 
TTTTTGTACG CCGAcggTGG TGCTGACTTT TTGGGTAATC TGCGCCTGTT 
CTTCGGGGGT GAGGATGCCC ATAACGTAGG TTACATTGCC GTAGGTAATG 
ATTTTGACGC GCGCCTGTGT AGCGGGGCTG ATGCCCAGCA GcgtgGCGCG 
GACTTTGGAC GTGTTCCAAG TGTCGCCGGC GATGTCGCCC GCAGTGCGCG 
GCAGGGAGGC GACGGTAATG TAGTTGTATA CGCCTTCGGC GGCCTGTTCG 
GAACGTGCAA TCTGACCGAC GAACTGTTTT TCGCCTTCGG TGGCGACTTG 
TCCGAGCAGC AGCAGGTGGC GGTTGTAGCC GACGACGGAG ATTTGGGGCG 
TGTAGCCTTT GGTTTGGTTG TTTTGGCGCA GGTAGGAACG GGCGGTGGTT 
TCGATACGCA ACGCCATAAC GTtgtCATCG GTTtgcgcgc CGGTGGTTcg 
gCGGTCGATG ACGGATTTTG CGCCGACGGC GGCCCCGCCG ACGACTGCGC 
TGAAGCAGCC GCCGAGGGCA AGGCTGAGGA CGGCGGCAAT CAGGGTGCGG 
ACGGTGTGTG GTTTGGGTTT CATCGGGGAC TTCCTTTCTT GGGCGTTTCA 
GACGGCATTG CTTTGCGCCA TGCCGTCTGA 

This encodes a protein having amino acid sequence <SEQ ID 214>: 

1 MMMPFIMLPW IAGVPAV PGQ KRLS RISLWG LAGVFFGVSG LVW FSLGVSF 

51 SLGVSLGCAC FSGV SFRGSG W GAFVGSTGV SLSVFSACV P VPVNESAARA 

101 ASEGRGLTRF FLGAAGDGSP LPLSSVPSGC AGSDEAAWWC SGWAASCPTA 

151 PFGSQNSVSR GLSVCCGSVW RVLSPFGLNV LTMPTANAPM AVIQMSNTAR 

201 IRSL GVSLKG LFGFFAILIV LL GCRAMPSE GGSDGIAESA LDWLVEGND 

251 FLYADGGADF LGNLRLFFGG EDAHNVGYIA VGNDFDARLC SGADAQQRGA 

301 DFGRVPSVAG DVARSARQGG DGNVWYAFG GLFGTCNLTD ELFFAFGGDL 

351 SEQQQVAWA DDGDLG RVAF GLWLAQVGT GGGF DTQRHN WIGLRAGGS 

401 AVDDGFCADG GPADDCAEAA AEGKAEDGGN QGADGVWFGF HRGLPFLGVS 

451 DGIALRHAV* 

ORF34ng and ORF34-1 show 90.0% identity in 459 aa overlap: 

10 20 30 40 4 50 

orf 34-1. pep MMMPFIMLPW I AGVPAVPGQNRLSRISLWGLGGVFFGVSGLVWFSLGVS LGCAC 

I MMMIM III I II MM:! II Itll I I 1:1 M Ml III III I! I II Mill 
orf34ng MMMPFIMLPW IAGVPAVPGQKRLSRISLWGLAGVFFGVSGLVWFS LGVS FS LGVS LGCAC 

10 20 30 40 50 60 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 



60 70 80 90 100 110 

orf 34-1. pep FSGVSFRGSGRGTFVGSTGVSLSVFSACVPASSGCLSVXAVSAGCGLTRFFLGAAGDGSP 
I I I I 1 I I 1 I I I : I I M I I I I I M I I I I I I : : :: Ml I 111111111111111 
orf34ng FSGVSFRGSGWGAFVGSTGVSLSVFSACVPVPVNESAARAASEGRGLTRFFLGAAGDGSP 
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70 



80 



90 



100 



110 



120 



10 



15 



20 



25 



30 



35 



40 



120 130 140 150 160 170 

orf 34-1 . pep LPLSSVPSGCAGSDEAAWWCSGWAASCPTTPFGSQNSVSRGLSVCCGSAXRVLSPFGLNV 
Mil I Mill I I I t I I I It III I II I II 1:1 III Mill I II INI II: II 1 1 1 I 1 I 1 I 
orf34ng LPLSSVPSGCAGSDEAAWWCSGWAASCPTAPFGSQNSVSRGLSVCCGSVWRVLSPFGLNV 

130 140 150 160 170 180 

180 190 200 210 220 230 

orf 34-1 . pep LTMPIANAPMAAIQMSNTARIRSLGVSLKGLFGFFAILIVLLGCRAMPSEGGSDGIAESA 
MM M I K M i I I I I M M ! M 1 1 I I I I I I f 1 1 I I M I M I It I M I ! M I I I I I M I I 
orf34ng LTMPTANAPMAVIQMSNTARIRSLGVSLKGLFGFFAILIVLLGCRAMPSEGGSDGIAESA 

190 200 210 220 230 240 

240 250 260 270 280 290 

O r f 3 4 - 1 • pep LDWLVEGDDFLYADGGADFLGNLRLFFGGEDAHNVGYVAVGNDFDARLCGGADAQQRGA 
I M II 1 1 h i I I I I I I M I I I I M M I I 1 1 II I I I I I I : I II I I I I II I I : I li M M f I 
orf34ng LDWLVEGNDFL YADGGADFLGNLRLFFGGEDAHNVGYI AVGN D FDARLCSGADAQQRGA 

250 260 270 280 290 300 

300 310 320 330 340 350 

or f 34-1 . pep DFGCVPSVAGDVAGSARQGGDGNIWHAFGGLFGTCNLTDELFFAFGGDLSEQQQVAWA 
Ml MIMIMI 1111111(1:11:1 I I I I I I I I I I I I M I I I I I I I M I I I I I I II I 
orf34ng DFGRVPSVAG DVARS ARQGGDGNVWYAFGGLFGTCNLTDE LFFAFGG DLSEQQQVAWA 

310 320 330 340 350 360 

360 370 380 390 400 410 

orf 34-1 . pep DDGDLGRVAFGLWLAQIGTGGGFDTQRHNVWGLRAGGSAVDGGFRADGGASDYCADAA 
MMMIMMIMIM:|IMMIMMMI:illlMMM I I I I II :| 11:11 
orf34ng DDGDLGRVAFGLWLAQVGTGGGFDTQRHNWIGLRAGGSAVDDGFCADGGPADDCAEAA 

370 380 390 400 410 420 

420 430 440 450 

orf 34-1 . pep AKGKAENGGN QGADGVR FG FHRVL P FLGVS DG I ALRHAVX 
1:1 MhllMMIII Mill I I I I I I I I I I I I I I I I I 
orf34ng AEGKAEDGGNQGADGVWFGFHRGLP FLGVS DGI ALRHAVX 

430 440 450 460 

Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N.meningitidis and N.gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 26 

The following partial DNA sequence was identified in ^meningitidis <SEQ ID 215>: 



45 



1 ATGAAAACCT 

51 CGCCGCCTGC 

101 CCGCCGCCGA 

151 CGTCGGCGAC 

201 AGAAAAAAGG 

251 CCGAATCTGG 



TCTTCAAAAC 
GGATT.CAAA 
CAACGGCGCG 
TTCGGCGATA 
CTACACCGTC 
CATTGGCTGA 



CCTTTCCGCC 
AAGACAGCGC 
GCGJAAAAAA 
TGGTCAAAGA 
AAACTGGTCG 
GGGCGAGTTG 



GCCGCACTCG 
GCCCGCCGCA 
GAAATCGTCT 
ACAAATCCAA 
AGTTTACCGA 



CGCTCATCCT 
TCCGCTTCTG 
TCGGCACGAC 
GCCGAGCTGG 
CTATGTACGC 



50 This corresponds to the amino acid sequence <SEQ ID 216; ORF4>: 

1 MKTFFKTLSA AALALILAAC G.QKDSAPAA SASAAADNGA AKKEIVFGTT 
51 VGDFGDMVKE QIQAELEKKG YTVKLVEFTD YVRPNLALAE GEL 

Further sequence analysis revealed the complete nucleotide sequence <SEQ ID 217>: 



55 



l 

51 
101 
151 
201 



ATGAAAACCT 
CGCCGCCTGC 
CCGCCGCCGA 
GTCGGCGACT 
GAAAAAAGGC 



TCTTCAAAAC 
GGCGGTCAAA 
CAACGGCGCG 
TCGGCGATAT 
TACACCGTCA 



CCTTTCCGCC 
AAGACAGCGC 
GCGAAAAAAG 
GGTCAAAGAA 
AACTGGTCGA 



GCCGCACTCG 
GCCCGCCGCA 
AAATCGTCTT 
CAAATCCAAG 
GTTTACCGAC 



CGCTCATCCT 
TCCGCTTCTG 
CGGCACGACC 
CCGAGCTGGA 
TATGTACGCC 
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251 CGAATCTGGC ATTGGCTGAG GGCGAGTTGG ACATCAACGT CTTCCAACAC 

301 AAACCCTATC TTGACGACTT CAAAAAAGAA CACAATCTGG ACATCACCGA 

351 AGTCTTCCAA GTGCCGACCG CGCCTTTGGG ACTGTACCCG GGCAAGCTGA 

401 AATCGCTGGA AGAAGTCAAA GACGGCAGCA CCGTATCCGC GCCCAACGAC 

451 CCGTCCAACT TCGCCCGCGT CTTGGTGATG CTCGACGAAC TGGGTTGGAT 

501 CAAACTCAAA GACGGCATCA ATCCGTTGAC CGCATCCAAA GCGGACATCG 

551 CCGAGAACCT GAAAAACATC AAAATCGTCG AGCTTGAAGC CGCGCAACTG 

601 CCGCGTAGCC GCGCCGACGT GGATTTTGCC GTCGTCAACG GCAACTACGC 

651 CATAAGCAGC GGCATGAAGC TGACCGAAGC CCTGTTCCAA GAACCGAGCT 

701 TTGCCTATGT CAACTGGTCT GCCGTCAAAA CCGCCGACAA AGACAGCCAA 

751 TGGCTTAAAG ACGTAACCGA GGCCTATAAC TCCGACGCGT TCAAAGCCTA 

801 CGCGCACAAA CGCTTCGAGG GCTACAAATC CCCTGCCGCA TGGAATGAAG 

851 GCGCAGCCAA ATAA 

This corresponds to the amino acid sequence <SEQ ID 218; ORF4-l>: 

1 MKTFFKTLSA AALALILAA C GGQKDSAPAA SASAAADNGA AKKEIVFGTT 

51 VGDFGDMVKE QIQAELEKKG YTVKLVEFTD YVRPNLALAE GELDINVFQH 

101 KPYLDDFKKE HNLDITEVFQ VPTAPLGLYP GKLKSLEEVK DGSTVSAPND 

151 PSNFARVLVM LDELGWIKLK DGINPLTASK ADIAENLKNI KIVELEAAQL 

201 PRSRADVDFA WNGNYAISS GMKLTEALFQ EPSFAYVNWS AVKTADKDSQ 

251 WLKDVTEAYN SDAFKAYAHK RFEGYKSPAA WNEGAAK* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with a predicted ORF from N. meningitidis (strain i 

ORF4 shows 93.5% identity over a 93aa overlap with an ORF (ORF4a) from strain A of K 

meningitidis: < 

10 20 30 '40 50 59 | 

orf 4 . pep MKTFFKTLSAAALALILAA CG-QKDSAPAASASAAADNGAAKKEIVFGTTVGDFGDMVKE 1 

I I I I I I i I I I M I I I I I I I I I I I I I I I I II I I I I I I I M I I I I I I I I I I I | I I I I | | | 
or f 4 a MKT FFKTLSAAALAL I LAA CGGQKDSAPAASASAAADNGAAXKE I VFGTT VGDFGDMVKE 

10 20 30 40 50 60 

60 70 80 90 

orf 4 . pep QIQAELEKKGYTVKLVEFTDYVRPNLALAEGEL 
II I I I II I I I I I I I I Mill 1 I I I I I I I I 
orf 4a XIQPELEKKGYTVKLVEXTDYVRXNLALAEGELDINVXQHXXYLDDXKKXHNLDITXVXQ 

70 80 90 100 110 120 

orf 4a VPTAPLGLYPGKLKSLXXVKXGSTVSAPNDPXXFXRVLVMLDELGXIKLKDXIXXXXXXX 
130 140 150 160 170 180 

The complete length ORF4a nucleotide sequence <SEQ ID 21 9> is: 

1 ATGAAAACCT TCTTCAAAAC CCTTTCCGCC GCCGCACTCG CGCTCATCCT 

51 CGCCGCCTGC GGCGGTCAAA AAGATAGCGC GCCCGCCGCA TCCGCTTCTG 

101. CCGCCGCCGA CAACGGCGCG GCGAANAAAG AAATCGTCTT CGGCACGACC 

151 GTCGGCGACT TCGGCGATAT GGTCAAAGAA CANATCCAAC CCGAGCTGGA 

201 GAAAAAAGGC TACACCGTCA AACTGGTCGA GTNTACCGAC TATGTGCGCN 

251 CGAATCTGGC ATTGGCTGAG GGCGAGTTGG ACATCAACGT CTTNCAACAC 

301 ANACNCTATC TTGACGACTN CAAAAAANAA CACAATCTGG ACATCACCNN 

351 AGTCTTNCAA GTGCCGACCG CGCCTTTGGG ACTGTACCCG GGCAAGCTGA 

401 AATCGCTGGA NNAAGTCAAA GANGGCAGCA CCGTATCCGC GCCCAACGAC 

451 CCGTNNNACT TCGNCCGCGT CTTGGTGATG CTCGACGAAC TGGGTTNGAT 

501 CAAACTCAAA GACNGCATCA NNNNGNNGNN NNNANCNANA NNNGANANNN 

551 NNNNANNNNT NNNNNNNNNN NNNNNCNNCG NNNNNNNANN NNNNNNNNNN 

601 NCGNNTNNNN NNGCNNNNNT NNANNNTNNN NNCNNCNNNN NNNNNTNNNN 

651 NANNANNAGC GGCATGAAGC TGACCGAAGC CCTGTTCCAA GAACCGAGCT 

701 TTGCCTATGT CAACTGGTCT GCCGTCAAAA CCGCCGACAA AGACAGCCAA 

751 TGGCTTAAAG ACGTAACCGA GGCCTATAAC TCCGACGCGT TCAAAGCCTA 

801 CGCGCACAAA CGCTTCGAGG GCTACAAATC CCCTGCCGCA TGGAATGAAG 

851 GCGCAGCCAA ATAA 

This is predicted to encode a protein having amino acid sequence <SEQ ID 220>: 

1 MKTFFKTLSA AALALILAA C GGQKDSAPAA SASAAADNGA AXKEIVFGTT 



WO 99/24578 



-166- 



PCT/IB98/01665 



51 VGDFGDMVKE XIQPELEKKG YTVKLVEXTD YVRXNLALAE GELDINVXQH 

101 XXYLDDXKKX HNLDITXVXQ VPTAPLGLYP GKLKSLXXVK XGSTVSAPND 

151 PXXFXRVLVM LDELGXIKLK DXIXXXXXXX XXXXXXXXXX XXXXXXXXXX 

201 XXXXAXXXXX XXXXXXXXXS GMKLTEALFQ EPSFAYVNWS AVKTADKDSQ 

251 WLKDVTEAYN SDAFKAYAHK RFEGYKSPAA WNEGAAK* 

A leader peptide is underlined. 



Further analysis of these strain A sequences revealed the complete DNA sequence <SEQ ID 221>: 



1 ATGAAAACCT TCTTCAAAAC CCTTTCCGCC GCCGCACTCG CGCTCATCCT 

51 CGCCGCCTGC GGCGGTCAAA AAGATAGCGC GCCCGCCGCA TCCGCTTCTG 

101 CCGCCGCCGA CAACGGCGCG GCGAAAAAAG AAATCGTCTT CGGCACGACC 

151 GTCGGCGACT TCGGCGATAT GGTCAAAGAA CAAATCCAAC CCGAGCTGGA 

201 GAAAAAAGGC TACACCGTCA AACTGGTCGA GTTTACCGAC TATGTGCGCC 

251 CGAATCTGGC ATTGGCTGAG GGCGAGTTGG ACATCAACGT CTTCCAACAC 

301 AAACCCTATC TTGACGACTT CAAAAAAGAA CACAATCTGG ACATCACCGA 

351 AGTCTTCCAA GTGCCGACCG CGCCTTTGGG ACTGTACCCG GGCAAGCTGA 

401 AATCGCTGGA AGAAGTCAAA GACGGCAGCA CCGTATCCGC GCCCAACGAC 

451 CCGTCCAACT TCGCCCGCGT CTTGGTGATG CTCGACGAAC TGGGTTGGAT 

501 CAAACTCAAA GACGGCATCA ATCCGCTGAC CGCATCCAAA GCGGACATTG 

551 CCGAAAACCT GAAAAACATC AAAATCGTCG AGCTTGAAGC CGCGCAACTG 

601 CCGCGTAGCC GCGCCGACGT GGATTTTGCC GTCGTCAACG GCAACTACGC 

651 CATAAGCAGC GGCATGAAGC TGACCGAAGC CCTGTTCCAA GAACCGAGCT 

701 TTGCCTATGT CAACTGGTCT GCCGTCAAAA CCGCCGACAA AGACAGCCAA 

751 TGGCTTAAAG ACGTAACCGA GGCCTATAAC TCCGACGCGT TCAAAGCCTA 

801 CGCGCACAAA CGCTTCGAGG GCTACAAATC CCCTGCCGCA TGGAATGAAG 

851 GCGCAGCCAA ATAA 

This encodes a protein having amino acid sequence <SEQ ID 222; ORF4a-l>: 



1 MKTFFKTLSA AALALILAA C GGQKDSAPAA SASAAADNGA AKKEIVFGTT 

51 VGDFGDMVKE QIQPELEKKG YTVKLVEFTD YVRPNLALAE GELDINVFQH 

101 KPYLDDFKKE HNLDITEVFQ VPTAPLGLYP GKLKSLEEVK DGSTVSAPND 

151 PSNFARVLVM LDELGWIKLK DGINPLTASK ADIAENLKNI KIVELEAAQL 

201 PRSRADVDFA WNGNYAISS GMKLTEALFQ EPSFAYVNWS AVKTADKDSQ 

251 WLKDVTEAYN SDAFKAYAHK RFEGYKSPAA WNEGAAK* 

ORF4a-l and ORF4-1 show 99.7% identity in 287 aa overlap: 



10 20 30 40 50 60 

orf4a-l MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAAKKEIVFGTTVGDFGDMVKE 
MM MM II Milt II II IIMIII llillll II 1 IIMIIIII I j II I II MIMIM 
orf4-l MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAAKKEIVFGTTVGDFGDMVKE 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf4a-l QIQPELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
Mi 1 I M M M I M 1 1 It 1 1 M I 1 1 I I I I M I I I I 1 1 I i I I 1 1 1 I I I 1 1 1 I 1 1 I I 1 1 I I 
orf4-l QIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKE HNLDITEVFQ 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf4a-l VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLK DGINPLTASK 

M I I M I I I I I I I I I I I I I I I I I I I II I I I I II I I I I I I I M I I I I II 1 I I I I I I I I I I I 
orf4-l VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf4a-l ADIAENLKN IKI VELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQE PSFAYVNWS 

I | I I I I II I I I I I I I I I I II I I I I I I I I I I I I II I I I I I I I I I I I I I I II II M I I I I I I 
orf4-l ADIAENLKN I KIVELEAAQL PRSRADVDFAWNGN YAIS SGMKLTEAL FQEPS FAYVNWS 

190 200 210 220 230 240 

250 260 270 280 

orf4a-l AVKTADKDSQWLKDVTEAYN S DAFKAYAHKRFEGYKS PAAWNEGAAKX 

I I I I I 1 M 1 1 1 1 1 1 1 1 1 1 1 M 1 1 I 1 1 1 M I 1 1 II 1 1 i 1 1 11 1 1 1 I 1 I 1 
orf 4 -1 AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKS PAAWNEGAAKX 

250 260 270 280 
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Homology with an outer membrane protein of Pasteurella haemolitica (accession q08869). 
ORF4 and this outer membrane protein show 33% aa identity in 91 aa overlap: 

10 20 

lip2. pasha MN FKKLLGVALVS ALALTACKDEKAQAP 

S I I I : : I I 111:11 : I : I 

ORF4 VXTPNPDGRTPCPSFLFETATTSGENMKTFFKTLSAAAL — ALILAACGFKKTARPPHPL 

110 120 130 140 150 

30 40 50 60 70 80 

10 lip2 . pasha -ATTAKTENKAPLKVGVMTGPEAQMTEVAVKIAKEKYGLDVELVQFTEYTQPNAALHSKD 

: I : |: :| ::|:: :: III I : I I : I I : I : : I I II : 
ORF4 LPPPTTARRKKEIVFGTTVGDFGDMVKEQIQAELEECKGYTVKLVEFTDYVRPNLALAEGE 
160 170 180 190 200 210 

15 90 100 110 120 130 140 

lip2 . pasha LDANAFQT VPYLEQE VKDRGYKLAI I GNTLVWPIAAYSKKI KN I SE LKDGATVAI PNNAS 
I 

ORF4 h ". 

20 Homology with a predicted ORF from N. gonorrhoeae 

ORF4 shows 93.6% identity over a 94aa overlap with a predicted ORF (ORF4.ng) from N. 
gonorrhoeae: • 

10 20 30 

orf4nm.pep MKTFFKTLSAAALALILAACGXQKDSAPAA 
25 I I II 11111:1:111 III III MM MM j 

orf4ng RANAVXTPNPDGRTPCLSFLFETATTSGENMKTFFKTLSTASLALILAACGGQKDSAPAA i 

200 210 220 230 240 250 | 

40 50 60 70 80 89 

30 or f 4nm. pep SASA-AADNGAAKKEIVFGTTVGDFGDMVKEQIQAELEKKGYTVKLVEFTDYVRPNLALA 

11:1 ^ I i I I 1 I I I 1 ! I I I I 1 I ! I I I I I I I I I I I 1 i I I I ! I I I 1 I I I I I I I I I I 1 I I t I I 
orf4ng SAAAPSADNGAAKKEIVFGTTVGDFGDMWEQIQAELEKKGYTVKLVEFTDYVRPNLALA 
260 270 280 290 300 310 

35 90 

orf4nm.pep EGEL 
MM 

orf4ng EGELDINVFQHKPYLDDFKKEHNLDITEAFQVPTAPLGLYPGKLKSLEEVKDGSTVSAPN 
320 330 340 350 360 370 

40 The complete length ORF4ng nucleotide sequence <SEQ ID 223> was predicted to encode a 
protein having amino acid sequence <SEQ ID 224>: 

1 MKTFFKTLST ASLALILAAC GGQKDSAPAA SAAAPSADNG AAKKEIVFGT 

51 TVGDFGDMVK EQIQAELEKK GYTVKLVEFT DYVRPNLALA EGELDINVFQ 

101 HKPYLDDFKK EHNLDITEAF QVPTAPLGLY PGKLKSLEEV KDGSTVSAPN 

45 151 DPSNFARALV MLNELGWIKL KDGINPLTAS KADIAENLKN IKIVELEAAQ 

201 LPRSRADVDF AWNGNYAIS SGMKLTEALF QEPSFAYVNW SAVKTADKDS 

251 QWLKDVTEAY NSDAFKAYAH KRFEGYKYPA AWNEGAAK* 

Further analysis revealed the complete length ORF4ng DNA sequence <SEQ ID 225> to be: 

1 atgAAAACCT TCTTCAAAAC cctttccgcc gccgcaCTCG CGCTCATCCT 

50 51 CGCAGCCTGc ggCggtcaAA AAGACAGCGC GCCCgcagcc tctgcCGCCG 

101 CCCCTTCTGC CGATAACGgc gCgGCGAAAA AAGAAAtcgt ctTCGGCACG 

151 Accgtgggcg acttcggcgA TAtggTCAAA GAACAAATCC AagcCGAgct 

201 gGAGAAAAAA GgctACACcg tcAAattggt cgaatttacc gactatgtGC. 

251 gCCCGAATCT GGCATTGGCG GAGGGCGAGT TGGACATCAA CGTCTTCCAA 

55 301 CACAAACCCT ATCTTGACGA TTTCAAAAAA GAACACAACC TGGACATCAC 

351 CGAAGCCTTC CAAGTGCCGA CCGCGCCTTT GGGACTGTAT CCGGGCAAAC 

401 TGAAATCGCT GGAAGAAGTC AAAGACGGCA GCACCGTATC CGCGCCCAac 

451 gACccgTCCA ACTTCGCACG CGCCTTGGTG ATGCTGAACG AACTGGGTTG 

501 GATCAAACTC AAAGACGGCA TCAATCCGCT GACCGCATCC AAAGCCGACA 

60. 551 TCGCGGAAAA CCTGAAAAAC ATCAAAATCG TCGAGCTTGA AGCCGCACAA 
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601 CTGCCGCGCA GCCGCGCCGA 

651 CGCCATAAGC AGCGGCATGA 

701 GCTTTGCCTA TGTCAACTGG 

751 CAATGGCTTA AAGACGTAAC 

801 CTACGCGCAC AAACGCTTCG 

851 AAGGCGCAGC CAAATAA 

This encodes a protein having amino acic 



CGTGGATTTT GCCGTCGTCA ACGGCAACTA 
AGCTGACCGA AGCCCTGTTC CAAGAGCCGA 
TCTGCCgtcA AAACCGCCGA CAAAGACAGC 
CGAGGCCTAT AACTCCGACG CGTTCAAAGC 
AGGGCTACAA ATACCCTGCC GCATGGAATG 



sequence <SEQ ID 226; ORF4ng-l>: 



1 MKTFFKTLSA AALALILAAC GGQKDSAPAA SAAAPSADNG AAKKEIVFGT 

51 TVGDFGDMVK EQIQAELEKK GYTVKLVEFT DYVRPNLALA EGELDINVFQ 

101 HKPYLDDFKK EHNLDITEAF QVPTAPLGLY PGKLKSLEEV KDGSTVSAPN 

151 DPSNFARALV MLNELGWIKL KDGINPLTAS KADIAENLKN IKIVELEAAQ 

201 LPRSRADVDF AWNGNYAIS SGMKLTEALF QEPSFAYVNW SAVKTADKDS 

251 QWLKDVTEAY NSDAFKAYAH KRFEGYKYPA AWNEGAAK* 

This shows 97.6% identity in 288 aa overlap with ORF4-1 : 



10 20 30 40 50 59 

orf 4-1 . pep MKTFFKTLSAAAIALILAACGGQKDSAPAASASA-AADNGAAKKEIVFGTTVGDFGDMVK 
I I I I I I I I I I I i I II I I I I I I I I I I I I I I I I I : I : I I I II 1 I I I I I II I I I I I I I I I I I 
orf4ng-l MKTFFKTLSAAALALILAACGGQKDSAPAASAAAPSADNGAAKKEIVFGTTVGDFGDMVK 

10 20 30 40 50 60 



60 70 80 90 100 110 119 

orf 4-1 . pep EQIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVF 
I I I I I I I I M I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I : I 
orf4ng-l EQIQAELEKKGYTVKLVEFTDYWPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEAF 

70 80 90 100 110 120 



120 130 140 150 160 170 179 

orf 4-1. pep QVPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTAS 
I ill II I II I M II I I I I I II I 111 MM Ml I I I I I: II I I: Ml II I I Mill INI I 
orf4ng-l QVPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARALVMLNELGWIKLKDGINPLTAS 

130 140 150 160 170 180 

180 190 200 210 220 230 239 

orf 4-1 . pep KADIAENLKN IKI VELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPS FAYVNW 

Mill M Ml M I II I II MM I IM II I MM II I II II M I Ml I M I III I Ml I II 
orf4ng-l KADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNW 

190 200 210 220 230 240 



240 250 260 270 280 

orf 4-1 . pep S AVKTADKDSQWLKDVTEAYNS DAFKAYAHKRFEG YKS PAAWNEGAAKX 

M I I I I II I I 11 I I II I II I II I I I I I I I I I I I I I I I II I It II I I II 
orf4ng-l SAVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKYPAAWNEGAAKX 

250 260 270 280 



In addition, ORF4ng-l shows significant homology with an outer membrane protein from the 
database: 



ID LIP2__PASHA STANDARD; PRT; 276 AA. 

AC Q08869; 

DT 01-NOV-1995 (REL. 32, CREATED) 

DT 01-NOV-1995 (REL. 32, LAST SEQUENCE UPDATE) 

DT 01-NOV-1995 (REL. 32, LAST ANNOTATION UPDATE) 

DE 28.2 KD OUTER MEMBRANE PROTEIN PRECURSOR. . . . 

SCORES Initl: 279 Initn: 416 Opt: 494 

Smith-Waterman score: 494; 36.0% identity in 275 aa overlap 

10 20 30 40 50 

orf 4ng-l . pep MKTFFKTLSAAAL — ALILAACGGQKDSAPAASAAAPSADNGAAKKEIVFGTTVGDFGDM 
II I ::[| I! |: I I :| :|||::| :::| I I |: :| ::| 

lip2 pasha MN FKKLLGVALVS ALALTACKDEKAQAPATTA KTENKAPLK VGVMTGPEAQM 

10 20 30 40 50 



60 70 80 90 100 110 

orf 4ng- 1 - pep VlCEQIQAELEKKGYTVlCLv^FTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITE 
:: :: III I : I I : I I : I : : I I I I :ll 1:11 Ml:: |::: :: 
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lip2_pasha TEVAVKIAKEKYGLDVELVQFTEYTQPNAALHSKDLDANAFQTVPYLEQEVKDRGYKLAI 

60 70 80 90 100 110 

120 130 140 150 160 170 

or f 4ng-l . pep AFQVPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARALVMLNELGWIKLKDGINPLT 

: : : I : : I I : I : : I : It I : I I : I I : I I I I I I : : I : I : I I It I : 
lip2_pasha IGNTLWPIAAYSKKIKNISEIiCDGATVAIPNNASNTARALLLLQAHGLLKLKDPKN-VF 
120 . 130 140 150 160 170 

180 190 200 210 220 230 

orf4ng-l.pep ASKADIAENLKNIKIVELEAAQLPRSRADVDFAVVNGNYAISSGMKLTE — ALFQEPSFA 

I . - || || | | | | | | : ; : : | | I I : : I I : I : : I ] : : I : : : : : : 
lip2_pasha ATENDIIENPKNIKIVQADTSLLTRMLDDVELAVINNTYAGQAGLSPDKDGIIVESKDSP 

180 190 200 210 220 230 

240 250 260 270 280 289 

orf 4ng-l . pep YVNWSAVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKYPAAWNEGAAKX 

III • • ' I I : I * ::::::: I I I : I 

lip2_pasha YVNLWSREDNKDDPRLQTFVKS FQTEEV FQEALKLFNGG WKGW 

240 250 260 270 

Based on this analysis, including the homology with the outer membrane protein of Pasteurella 
haemolitica, and on the presence of a putative prokaryotic membrane lipoprotein lipid attachment 
site in the gonococcal protein, it was predicted that these proteins from Km^ningitidis and 
^.gonorrhoeae^ and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. I 

ORF4-1 (30kDa) was cloned in pET and pGex vectors and expressed in E.colU as described above. 
The products of protein expression and purification were analyzed by SDS-PAGE. Figures 8A and 
8B show, repsectively, the results of affinity purification of the His-fusion and GST-fusion 
proteins. Purified His-fusion protein was used to immunise mice, whose sera were used for ELISA 
(positive result), Western blot (Figure 8C), FACS analysis (Figure 8D), and a bactericidal assay 
(Figure 8E). These experiments confirm that ORF4-1 is a surface-exposed protein, and that it is a 
useful immunogen. 

Figure 8F shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF4-L 
Example 27 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 227>: 

1 CCTCGTCGTC CTCGGCATGC TCCAGTTTCA AGGGGCGATT TACTCCAAGG 

51 CGGTGGAACG TATGCTCGGC ACGGTCATCG GGCTGGGCGC GGGTTTGGGC 

101 GTTTTATGGC TGAACCAGCA TTATTTCCAC GGCAACCTCC TCTTCTACCT 

151 CACCGTCGGC ACGGCAAGCG CACTGGCCGG CTGGGCGGCG GTCGGCAAAA 

201 ACGGCTACGT CCCTmTGCTG GCAGGGCTGA CGATGTGTAT GCTCATCGGC 

251 GACAACGGCA GCGAATGGCT CGACAGCGGA CTCATGCGCG CCATGAACGT 

301 CCTCATCGGC GyGGCCATCG CCATCGCCGC CGCCAAACTG CTGCCGCTGA 

351 AATCCACACT GATGTGGCGT TTCATGCTTG CCGACAACCT GGCCGACTGC 

401 AGCAAAATGA TTGCCGAAAT CAGCAACGGC AGGCGCATGA CCCGCGAACG 

451 CCTCGAGGAG AACATGGCGA AAATGCGCCA AATCAACGCA CGCATGGTCA 

501 AAAGCCGCAG CCATCTCGCC GCCACATCGG GCGAAAGCTG CATCAGCCCC 

551 GCCATGATGG AAGCCATGCA GCACGCCCAC CGTAAAATCG TCAACACCAC 

601 CGAGCTGCTC CTGACCACCG CCGCCAAGCT GCAATCTCCC AAACTCAACG 
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651 GCAGCGAAAT CCGGCTGCTT GACCGCCACT TCACACTGCT CCAAAC 

701 GC AGACACGCCC GCCGCATCCG 

751 CATCGACACC GCCATCAACC CCGAACTGGA AGCCCTCGCC GAACACCTCC 

801 ACTACCAATG GCAGGGCTTC CTCTGGCTCA GCACCGATAT GCGTCAGGAA 

851 ATTTCCGCCC TCGTCATCCT GCTGCAACGC ACCCGCCGCA AATGGCTGGA 

901 TGCCCACGAA CGCCAACACC TGCGCCAAAG CCTGCTTGA 

This corresponds to the amino acid sequence <SEQ ID 228; ORF8>: 

1 PRRP RHAPVSRGDL LQGGGTYARH GHRAGRGFGR FMAEPALFPR 

51 QPPLLPHRRH GKRTGRLGGG RQKRLRPXAG RADDVYAHRR QRQRMARQRT 

101 HARHERPHRR GHRHRRRQTA AAEIHTDVAF HACRQPGRLQ QNDCRNQQRQ 

151 AHDPRTPRGE HGENAPNQRT HGQKPQPSRR HIGRKLHQPR HDGSHAARPP 

201 XNRQHHRAAP DHRRQAAISQ TQRQRNPAAX PPLHTAPN Q 

251 TRPPHPHRHR HQPRTGSPRR TPPLPMAGLP LAQHRYASGN FRPRHPAATH 

301 PPQMAGCPRT PTPAPKPA* 

Computer analysis of this amino acid sequence gave the following results: 
Se quence motifs 

ORF8 is proline-rich and has a distribution of proline residues consistent with a surface 
localization. Furthermore the presence of an RGD motif may indicate a possible role in bacterial 
adhesion events. 

Homology with a predicted ORF from N. gonorrhoeae 

ORF8 shows 86.5% identity over a 312aa overlap with a predicted ORF (ORF8.ng) from N. 
gonorrhoeae: 



orf8ng 


1 


MDRDDRLRRPRHAPVPRRDLLQRGGTYARYGHRAGRGFGR FMAEPALFPR 


50 




1 1 1! II II 1 MM II Ml 1:1 III 1 MM III IMM M 1 




or f 8. pep 


1 


PRRPRHAPVSRGDLLQGGGTYARHGHRAGRGFGRFMAEPALFPR 


44 


orf8ng 


51 


QPPLLPDHRHGKRTGRLGGGRQKRLRPYVGGADDVHAHRRQRQRMARQRP 


100 




II 1 1 1 1 ! M M 1 M II 1 1 M II 1 II 1 1 1 II : 1 II II II 1 II II 1 




or f 8. pep 


45 


QPPLLPHRRHGKRTGRLGGGRQKRLRPXAGRADDVYAHRRQRQRMARQRT 


94 


orf8ng 


101 


DARDERPHRRRHRHCRRQTAAAEIHTDVAFHACRQPGRLQQNDCRNQQRQ 


150 




11 1 IMM III MIMIMMMMMMIMM MMIMIMI 




or f 8. pep 


95 


HARHERPHRRGHRHRRRQTAAAEIHTDVAFHACRQPGRMQQNDCRNQQRQ 


144 


orf 8ng 


151 


AYDARTFGAEYGQNAPNQRTHGQKPQPPRRHIGRKPHQPLHDGSHAARPP 


200 




Mi II |:I:MMM!MIMM lllllll III MIIMIMI 




or f 8. pep 


145 


AHDPRTPRGEHGENAPNQRTHGQKPQPSRRHIGRKLHQPRHDGSHAARPP 


194 


orf8ng 


201 


QNRQHHRAAPDHRRQAAI SQTQRQRN PAARPPLHT APNRPATNRRPHQRQ 


250 




M II i M II 1 II 1 M II I I 1 II 1 II II 1 1 Mill M 1 




orf 8. pep 


195 




244 


orf8ng 


251 


TRPPHPHRHRHQPRTGSPRRTPPLPMAGFPLAQHQYASGNFRPRHPPATH 


300 




MIIMIIIIMIIMIMMIMMM MMMMMIIMM Ml 




orf 8. pep 


245 


TRPPHPHRHRHQPRTGSPRRTPPLPMAGLPLAQHRYASGN FRPRHPAATH 


294 


orf8ng 


301 


PPQMAGCPRT PTPAPKPA* 319 








II 1 II M M 1 II 1 II II 1 1 




orf 8. pep 


295 


P PQMAGCPRT PT PAPKPA* 313 





The complete length ORF8ng nucleotide sequence <SEQ ID 229> is predicted to encode a protein 
having amino acid sequence <SEQ ID 230>: 

1 MDRDDRLRRP RHAPVPRRDL LQRGGTYARY GHRAGRGFGR FMAEPALFPR 

51 QPPLLPDHRH GKRTGRLGGG RQKRLRPYVG GADDVHAHRR QRQRMARQRP 

101 DARDERPHRR RHRHCRRQTA AAEIHTDVAF HACRQPGRLQ QNDCRNQQRQ 

151 AYDARTFGAE YGQNAPNQRT HGQKPQPPRR HIGRKPHQPL HDGSHAARPP 
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201 QNRQHHRAAP DHRRQAAISQ TQRQRNPAAR PPLHTAPNRP ATNRRPHQRQ 
251 TRPPHPHRHR HQPRTGSPRR TPPLPMAGFP LAQHQYASGN FRPRHPPATH 
301 PPQMAGCPRT PTPAPKPA* 

Based on the sequence motifs in these proteins, it is predicted that the proteins from ^meningitidis 
and N.gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



Example 28 



The following partial DNA sequence was identified in N.meningitidis <SEQ ID 231>: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 



. GAAATCAGCC 
GGATTCGGAA 
GGGCGTGGGT 
CGCGATTTGT 
TGTCCGCATC 
TGCAGGAACA 
GCTTT.GGCA 
CCGCTGGTTC 
TCGTCGTCAG 
GGACATTATC 
AGAATCGCTC 
GTTATCCTTT 



TGCGGTCCGA 
CGTTTTCTGC 
GGAAAACGGC 
CGCCTTTGGG 
GTCGGTTGCG 
GCTCGCCCGA 
TACGCAACCA 
AACGCCTTGG 
TTGCGGCACG 
TCGGAGA.GG 
GCCGTCCGAA 
CCCGACCGG. . 



CNACAGGCCG 
TGTTGGACGG 
ACGTTCGCAA 
CGCGGAGTGG 
CTGTGTGCGG 
AAAATCGAGT 
CTACCGCCAC 
GCAGCCGCCG 
GCGGTAACGG 
AACCATCATG 
CCGCCAACCT 



GTTTCCGTGN 
CGGCAACAGC 
CCGTCGGTAG 
GCGGAAAAGG 
AGAATTCAAA 
GGCTGCCGTC 
CCCGAAGAAC 
CTTCAGCCGC 
TTGACGCGCT 
CCCGGTTTCC 
CAACCGGCAC 



CGAAGCGGCG 
CGGCTCAAGT 
CGCGCCGTAC 
CGGATGGAAA 
AAGGCACAAG 
TTCCGCACAG 
ACGGTTCCGA 
AACGCCTGCG 
CACCGATGAC. 
ACCTGATGAA 
GCCGGTAAGC 



This corresponds to the amino acid sequence <SEQ ID 232; ORF61>: 

1 ..EISLRSDXRP VSVXKRRDSE RFLLLDGGNS RLKWAWVENG TFATVGSAPY 

51 RDLS PLGAEW AEKADGNVRI VGCAVCGEFK KAQVQEQLAR KIEWLPSSAQ 

101 AXGIRNHYRH PEEHGSDRWF NALGSRRFSR NACVWSCGT AVTVDALTDD 

151 GHYLGXGTIM PGFHLMKESL AVRTANLNRH AGKRYPFPT. . 

Further work revealed the complete nucleotide sequence <SEQ ID 233>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 



ATGACGGTTT 
CGGTTTGCCG 
CGCAGCAGCT 
CTGTTGCGCC 
TTTCGATGCC 
CGGCATTGAA 
GCGCGGATTG 
GCAAAGTAAG 
GCGAGTGTCT 
GAGTTGGGTT 
GTCGCGTTTA 
TCGGACGCGA 
GGCAAAACGG 
GGAAGTAGAA 
GGCGGGGCAA 
CTGGACGCGG 
GGCGGAATAT 
TGCGCGACGG 
CAAGGCGTTT 
CGGCGAAATC 
GGCGGGATTC 
AAGTGGGCGT 
GTACCGCGAT 
GAAATGTCCG 
CAAGTGCAGG 
ACAGGCTTTG 
CCGACCGCTG 
TGCGTCGTCG 
TGACGGACAT 
AAGAATCGCT 
CGTTATCCTT 
GGATGCGGTT 
AAACCGGGGC 



TGAAGCTTTC 
CAACACGTCT 
CAACGGTTTT 
AACACGACGG 
GAAGGTTTGC 
GCACGAGTGC 
CGCCGGACAA 
GGCAGGGGGC 
GATGTTCAGT 
CGCTGTCGCC 
GGTTTGGATG 
CAAATTGGGC 
TTGCCGTGGT 
AATGCCGCTT 
TGCCGATGCC 
TGTTGTTGCA 
CAGGCTGCCA 
CGAAACCGTG 
TGCACTTGGA 
AGCCTGCGGT 
GGAACGTTTT 
GGGTGGAAAA 
TTGTCGCCTT 
CATCGTCGGT 
AACAGCTCGC 
GGCATACGCA 
GTTCAACGCC 
TCAGTTGCGG 
TATCTCGGGG 
CGCCGTCCGA 
TCCCGACCAC 
TGCGGCTCGG 
GGGCAAGCCT 



GCACTGGCGG 
CGCAACTGGC 
TGGCAGCAGA 
CTATTGGCGG 
GCGAGCTGGG 
GCGTCCAGCA 
GGCGCACAAA 
GGCAGGGGCG 
TTTGGCTGGG 
TGTTGCGGCA 
TGCAGATTAA 
GGCATTCTGA 
CGGTATCGGC 
CCGTGCAATC 
GCCGTGCTGC 
ATATGCGCGG 
ACCGCGACCA 
TTCGAAGGCA 
AACGGCAGAG 
CCGACGACAG 
CTGCTGTTGG 
CGGCACGTTC 
TGGGCGCGGA 
TGCGCTGTGT 
CCGAAAAATC 
ACCACTACCG 
TTGGGCAGCC 
CACGGCGGTA 
GAACCATCAT 
ACCGCCAACC 
AACGGGCAAT 
TTATGATGAT 
GTCGATGTCA 



GTGTTGGCGG 

GCGTATGGCG 

TGCCGGCGCA 

CTGGTGCGCC 

GGAAAGGTCG 

ACGACGAGAT 

ACCATATGCG 

GAAGTGGTCG 

TGTTTGACCG 

GTGGCGTGTC 

GTGGCCCAAT 

TTGAAACGGT 

ATCAATTTTG 

GCTGTTTCAG 

TGGAAACGCT 

GACGGATTTG 

CGGCAAGGCG 

CGGTTAAAGG 

GGCAAACAGA 

GCCGGTTTCC 

ACGGCGGCAA 

GCAACCGTCG 

GTGGGCGGAA 

GCGGAGAATT 

GAGTGGCTGC 

CCACCCCGAA 

GCCGCTTCAG 

ACGGTTGACG 

GCCCGGTTTC 

TCAACCGGCA 

GCCGTCGCCA 

GCACGGGCGT 

TCATTACCGG 



AGCTTGCCGA 

GATATGAAGC 

CATACGCGGG 

CATTGGCGGT 

GGTTTTCAGA 

ACTGGAATTG 

TGACCCACCT 

CACCGTTTGG 

GCCGCAGTAT 

GGCGCGCCTT 

GATTTGGTTG 

CAGGACGGGC 

TCCTGCCCAA 

ACGGCATCGC 

GTTGGTGGAA 

CGCCTTTTGT 

GTATTGCTGT 

CGTGGACGGA 

CGGTCGTCAG 

GTGCCGAAGC 

CAGCCGGCTC 

GTAGCGCGCC 

AAGGCGGATG 

CAAAAAGGCA 

CGTCTTCCGC 

GAACACGGTT 

CCGCAACGCC 

CGCTCACCGA 

CACCTGATGA 

CGCCGGTAAG 

GCGGCATGAT 

TTGAAAGAAA 

CGGCGGCGCG 
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1651 GCAAAAGTTG CCGAAGCCCT GCCGCCTGCA TTTTTGGCGG AAAATACCGT 
1701 GCGCGTGGCG GACAACCTCG TCATTTACGG GTTGTTGAAC ATGATTGCCG 
1751 CCGAAGGCAG GGAATATGAA CATATTTAA 

This corresponds to the amino acid sequence <SEQ ID 234; ORF61-l>: 



1 MTVLKLSHWR VLAELADGLP QHVSQLARMA DMKPQQLNGF WQQMPAHIRG 

51 LLRQHDGYWR LVRPLAVFDA EGLRELGERS GFQTALKHEC ASSNDEILEL 

101 ARIAPDKAHK TICVTHLQSK GRGRQGRKWS HRLGECLMFS FGWVFDRPQY 

151 ELGSLSPVAA VACRRALSRL GLDVQIKWPN DLWGRDKLG GILIETVRTG 

201 GKTVAWGIG INFVLPKEVE NAASVQSLFQ TASRRGNADA AVLLETLLVE 

251 LDAVLLQYAR DGFAPFVAEY QAANRDHGKA VLLLRDGETV FEGTVKGVDG 

301 QGVLHLETAE GKQTWSGEI SLRSDDRPVS VPKRRDSERF LLLDGGNSRL 

351 KWAWVENGTF ATVGSAPYRD LSPLGAEWAE KADGNVRIVG CAVCGEFKKA 

401 QVQEQLARKI EWLPSSAQAL GIRNHYRHPE EHGSDRWFNA LGSRRFSRNA 

451 CVWSCGTAV TVDALT DDGH YLGGTIMPGF HLMKESLAVR TANLNRHAGK 

501 RYPFPTTTGN AVASGMMDAV CGSVMMMHGR LKEKTGAGKP VDVIITGGGA 

551 AKVAEALPPA FLAENTVRVA DNLVIYGLLN MIAAEGREYE HI* 

Figure 9 shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF61-1. Further 
computer analysis of this amino acid sequence gave the following results: 

Homology with the baf protein of B. pertussis (accession number Ul 2020V 
ORF61 and baf protein show 33% aa identity in 166aa overlap: 

orf61 23 LLLDGGN SRLKWAWVE -NGT FATVGSAPYR DLSPLGAEWAEKADGNVRIVGCAVCG 77 

+L+D GNSRLK W + + A AP DL LG A R +G V G 

baf 3 ILIDSGNSRLKVGWFDPDAPQAAREPAPVAFDNLDLDALGRWLATLPRRPQRALGVNVAG 62 



orf61 78 EFKKAQVQEQLAR KIEWLPSSAQAXGIRNHYRHPEEHGSDRW FNALGSRRFSRN 131 

+ + L I WL + A G+RN YR+P++ G+DRW L + 

baf 63 LARGEAIAATLRAGGCD IRWLRAQPLAMGLRNGYRNPDQLGADRWACMVGVLARQP SVHP 122 

orf61 132 ACVWSCGTAVTVDALTDDGHYLGXGTIMPGFHLMKESLAVRTANL 177 

+V S GTA T+D + D + G G I+PG +M+ +LA TA+L 
baf 123 PLLVASFGTATTLDTIGPDNVFPG-GLILPGPAMMRGALAYGTAHL 167 



Homology with a predicted ORF from Kmeninsitidis (strain A) 

ORF61 shows 97.4% identity over a 189aa overlap with an ORF (ORF61a) from strain A of AT. 
meningitidis: 



10 20 30 

orf 61 . pep E I SLR S DXRPVSVXKRRDSERFZjLLDGGN S 

MINI! I I I I I 1 1 ! I I I I I I I I ! 1 1 I I 
orf 61a TVFEGTVKGVDGQGVLHLETAEGKQTWSGEISLRSDDRPVSVPKRRDSERFLLLDGGNS 
290 300 310 320 330 340 



40 50 60 70 80 90 

orf 61 . pep RLKWAWVENGT FATVGS AP YRDLS PLGAEWAEKADGNVRI VGCAVCGE FKKAQVQEQLAR 
I I I I I I I 1 I I I M I I I I II I II I I M M I I I I I : I I I I I ! I I I M I I I I I I I I I I I i I I I 
orf 61a RLKWAWVENGT FATVGS APYRDLS PLGAEWAEKVDGNVRI VGCAVCGE FKKAQVQEQLAR 

350 360 370 380 390 400 



100 110 120 130 140 150 

orf 61 . pep KIEWLPSSAQAXGIRNHYRHPEEHGSDRWFNALGSRRFSRN ACVWSCGTAVTVDALT DD 
I I 1 I I I t 1 t 1 I I I I I I I I I I! I I I I t I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I M I 
orf 61a KIEWLPSSAQALGIRNHYRHPEEHGSDRWFNALGSRRFSRN ACVWSCGTAVTVDALT DD 
410 420 430 440 450 460 



160 170 180 189 

or f 61 . pep GHYLGXGTIMPGFHLMKESLAVRTANLNRHAGKRYPFPT 

inn 1 1 ii 1 1 1 1 1 ii 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 n 

orf 61a GHYLG- GTIMPGFHLMKE S LAVRTANLNRHAGKRYPFPTTTGNAVASGMMDAVCG S VMMM 

470 480 490 500 510 520 



orf 61a 



HGRLKEKTGAGKPVDVIITGGGAAKVAEALPPAFLAENTVRVADNLVIHGLLNLIAAEGG 
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530 540 550 560 570 

The complete length ORF61a nucleotide sequence <SEQ ID 235> is: 



580 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 



ATGACGGTTT 
CGGTTTGCCG 
CGCAGCAGCT 
CTGTTGCGCC 
TTTCGATGCC 
CGGCATTGAA 
GCGCGGATTG 
GCAAAGTAAG 
GCGAGTGTCT 
GAGTTGGGTT 
GTCGCGTTTG 
TCGGACGCGA 
GGCAAAACGG 
GGAAGTGGAA 
GGCGGGGAAA 
CTTGATGCGG 
GGCGGAATAT 
TGCGCGACGG 
CAAGGCGTTC 
CGGCGAAATC 
GGCGGGATTC 
AAGTGGGCGT 
GTACCGCGAT 
GAAATGTCCG 
CAAGTGCAGG 
ACAGGCTTTG 
CCGACCGCTG 
TGCGTCGTCG 
TGACGGACAT 
AAGAATCGCT 
CGTTATCCTT 
GGATGCGGTT 
AAACCGGGGC 
GCAAAAGTTG 
GCGCGTGGCG 
CCGAAGGCGG 



TGAAGCCTTC 
CAACACGTCT 
CAACGGTTTT 
AACACGACGG 
GAAGGTTTGC 
GCACGAGTGC 
CGCCGGACAA 
GGCAGGGGGC 
GATGTTCAGT 
CGCTGTCGCC 
GGTTTGAAAA 
CAAATTGGGC 
TTGCCGTGGT 
AACGCCGCTT 
TGCCGATGCC 
TGTTGTTGCA 
CAGGCTGCCA 
CGAAACCGTG 
TGCACTTGGA 
AGCCTGCGGT 
GGAACGTTTT 
GGGTGGAAAA 
TTGTCGCCTT 
CATCGTCGGT 
AACAGCTCGC 
GGCATACGCA 
GTTCAACGCC 
TCAGTTGCGG 
TATCTCGGGG 
CGCCGTCCGA 
TCCCGACCAC 
TGCGGCTCGG 
GGGCAAGCCT 
CCGAAGCCCT 
GACAACCTCG 
GGAATCGGAA 



GCACTGGCGG 
CGCAACTGGC 
TGGCAGCAGA 
CTATTGGCGG 
GCGAGCTGGG 
GCGTCCAGCA 
GGCGCACAAA 
GGCAGGGGCG 
TTTGGCTGGG 
TGTTGCGGCA 
CGCAAATCAA 
GGCATTCTGA 
CGGTATCGGC 
CCGTGCAATC 
GCCGTGTTGC 
ATATGCGCGG 
ACCGCGACCA 
TTCGAAGGCA 
AACGGCAGAG 
CCGACGACAG 
CTGCTGTTGG 
CGGCACGTTC 
TGGGCGCGGA 
TGCGCCGTGT 
CCGAAAAATC 
ACCACTACCG 
TTGGGCAGCC 
CACGGCGGTA 
GAACCATCAT 
ACCGCCAACC 
AACGGGCAAT 
TTATGATGAT 
GTCGATGTCA 
GCCGCCTGCA 
TCATTCACGG 
CATACTTAA 



GTGTTGGCGG 
GCGTATGGCG 
TGCCGGCGCA 
CTGGTGCGCC 
GGAAAGGTCG 
ACGACGAGAT 
ACCATATGTG 
GAAGTGGTCG 
TGTTTGACCG 
GTGGCGTGCC 
GTGGCCAAAC 
TTGAAACGGT 
ATCAATTTCG 
GCTGTTTCAG 
TGGAAACGCT 
GACGGATTTG 
CGGCAAGGCG 
CGGTTAAAGG 
GGCAAACAGA 
GCCGGTTTCC 
ACGGCGGCAA 
GCAACCGTCG 
GTGGGCGGAA 
GCGGAGAATT 
GAGTGGCTGC 
CCACCCCGAA 
GCCGCTTCAG 
ACGGTTGACG 
GCCCGGTTTC 
TCAACCGGCA 
GCCGTCGCCA 
GCACGGGCGT 
TCATTACCGG 
TTTTTGGCGG 
GCTGCTGAAC 



AGCTTGCCGA 
GATATGAAGC 
CATACGCGGG 
CATTGGCGGT 
GGTTTTCAGA 
ACTGGAATTG 
TGACCCACCT 
CACCGTTTGG 
GCCGCAGTAT 
GGCGCGCCTT 
GATTTGGTCG 
CAGGACGGGC 
TGCTGCCCAA 
ACGGCATCGC 
GTTGGCGGAA 
CGCCTTTTGT 
GTATTGCTGT 
CGTGGACGGA 
CGGTCGTCAG 
GTGCCGAAGC 
CAGCCGGCTC 
GTAGCGCGCC 
AAGGTGGATG 
CAAAAAGGCA 
CGTCTTCCGC 
GAACACGGTT 
CCGCAACGCC 
CGCTCACCGA 
CACCTGATGA 
CGCCGGTAAG 
GCGGCATGAT 
TTGAAAGAAA 
CGGCGGCGCG 
AAAATACCGT 
CTGATTGCCG 



This encodes a protein having amino acid sequence <SEQ ID 236>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 



MTVLKPSHWR 
LLRQHDGYWR 
ARIAPDKAHK 
ELGSLSPVAA 
GKTVAWGIG 
LDAVLLQYAR 
QGVLHLETAE 
KWAWVENGTF 
QVQEQLARKI 
CVWSCGTAV 



VLAELADGLP 
LVRPLAVFDA 
TICVTHLQSK 
VACRRALSRL 
INFVLPKEVE 
DGFAPFVAEY 
GKQTWSGEI 
ATVGSAPYRD 
EWLPSSAQAL 
TVDALTDDGH 



RYPFPTTTGN 
AKVAEALPPA 



AVASGMMDAV 
FLAENTVRVA 



QHVSQLARMA 
EGLRELGERS 
GRGRQGRKWS 
GLKTQIKWPN 
NAASVQSLFQ 
QAANRDHGKA 
SLRSDDRPVS 
LSPLGAEWAE 
GIRNHYRHPE 
YLGGTIMPGF 
CGSVMMMHGR 
DNLVIHGLLN 



DMKPQQLNGF 
GFQTALKHEC 
HRLGECLMFS 
DLWGRDKLG 
TASRRGNADA 
VLLLRDGETV 
VPKRRDSERF 
KVDGNVRIVG 
EHGSDRWFNA 
HLMKESLAVR 
LKEKTGAGKP 
LXAAEGGESE 



WQQMPAHIRG 
ASSNDEILEL 
FGWVFDRPQY 
GILIETVRTG 
AVLLETLLAE 
FEGTVKGVDG 
LLLDGGNSRL 
CAVCGEFKKA 
LGSRRFSRNA 
TANLNRHAGK 
VDVIITGGGA 
HT* 



ORF61a and ORF61-1 show 98.5% identity in 591 aa overlap: 



10 20 30 40 50 60 

or f 61a . pep MTVl^PSHWRVIJ^I^DGLPQHVSQLARMADMKPQQLNGFWQQMPAHIRG LLRQHDGYWR 
Mill 1 I I I I I I I I I I I I I I 1 I I I I I 1 I I I I I I I I I I 1 I I I I I I I I | i | | | | | | | | | | | 
orf61-l tfTVLKLSHWRVIJ^LADGLPQHVSQLARMADMKPQQLNGFWQQMPAHIRGLLRQHDGYWR 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 61a. pep LVRPIJVVFDAEGLRELGERSGFQTALKHECASSNDEILEIJ^IAPDKAHKTICVTRLQSK 
I t M ! I I I I I I I I I II I I I I 1 II I I I I I I I I I I I I M I 1 I M M I I I I M ! i i I M I I II 
orf 61-1 lvrpiavfdaeglreixsersgfqtalkhecassndeii^ij^ 

70 80 90 100 110 120 



130 



140 



150 



160 



170 



180 
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or f 6 la . pep GRGRQGRKWSHRLGECLMFSFGWVFDRPQYELGSLSPVAAVACRRALSRLGLKTQIKWPN 
I I Mi II H MIMM I II I II I Ml! ! II I II I IMMM MM Ml INI : 1 I I M I 
orf61-l GRGRQGRKWSHRIX3ECLMFSFGWVFDRPQYELGSLSPVAAVACRRALSRLGLDVQIKWPN 
130 140 150 160 170 180 

190 200 210 220 230 240 

or f 61a . pep DLWGRDKLGGILIETVRTGGKTVAWGIGINFVLPKEVENAASVQSLFQTASRRGNADA 
I I I I I I I M I I I II M I I I I I I I I 1 1 I I I I M I I 1 1 I I I llti I M I { 1 1 I If I 1 1 I I ] I 
orf 61-1 DLWGRDKLGG I LIETVRTGGKTVAWGI GIN FVLPKEVENAAS VQS LFQTASRRGNADA 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 61a . pep AVLLETLLAELDAVLLQYARDGFAPFVAEYQAANRDHGKAVLLLRDGETVFEGTVKGVDG 
I I M I I I I * I I I M M M I I I I ! 1 1 1 I I [ I I M I f I I I t II M I f I f 1 { I I I I I I M 1 1 f 
orf 61-1 AVLLETLLVELDAVLLQYARDGFAPFVAEYQAANRDHGKAVLLLRDGETVFEGTVKGVDG 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 61a . pep QGVLHLETAEGKQTWSGEISLRSDDRPVSVPKRRDSERFLLLDGGNSRLKWAWVENGTF 
I I I I M I I J I I I I I I I I I II I II I I I I I I I I I I I I I I I I I II I | | | | | | | | | | | | | || | | 
orf 61-1 QGVLHLETAEGKQTWSGEISLRSDDRPVSVPKRRDSERFLLLDGGNSRLKWAWVENGTF 

310 320 330 340 350 360 

370 380 390 400 410 420 

or f 61a . pep ATVGSAPYRDLS PLGAEWAEKVDGNVRI VGCAVCGE FKKAQVQEQLARKIEWLPS SAQAL 
II II I I M I I II I I I II I I I I : II I II I II I I I II I I M I I I I I I I I I I I I I I I I I I I I I 
orf 61-1 ATVGSAPYRDLS PLGAEWAEKADGNVR I VGCAVCGE FKKAQVQEQLARKIEWL PS SAQAL 

370 380 390 400 410 420 

430 440 450 460 470 480 

or f 61 a . pep GIRNHYRHPEEHGSDRWFNALGSRRFSRNACVWSCGTAVTVDALTDDGHYLGGTIMPGF 
I I I I II I I I I I I I I I I M I I I I I I I I I I I I I I I I I II I I I I I I I II I I I I M I I I I I II I 
orf 61-1 GIRNHYRHPEEHGSDRWFNALGSRRFSRNACVWSCGTAVTVDALTDDGHYLGGTIMPGF 

430 440 450 460 470 480 

490 500 510 520 530 540 

or f 61a . pep HLMKESLAVRTANLNRHAGKRYPFPTTTGNAVASG^D 

M I I MMM I II M I II II MMMIMI I II MMI Ml III MM II M M I M II I 
orf 61-1 HLMKE S LAVRTANLNRHAGKRYPFPTTTGNAVAS GMMDAVCGS VMMMHGRLKEKTGAGKP 

490 500 510 520 530 540 

550 560 570 580 590 

or f 61a . pep VDVIITGGGAAKVAEALPPAFLAENTVRVADNLVIHGLLNLIAAEGGESEHTX 

III I MMI MM MM II II MMM MM II MM MM II I II I I I 
orf 61-1 VDV I ITGGGAAKVAEALPPAFLAENTVRVADNLVI YGLLNMI AAEGREYEHIX 

550 560 570 580 590 



Homology with a predicted ORF from N. gonorrhoeae 

ORF61 shows 94.2% identity over a 189aa overlap with a predicted ORF (ORF6Lng) from N. 
gonorrhoeae: 

or f 61 . pep EISLRSDXRPVSVXKRRDSERFLLLDGGNS 30 

illll I I III II I M I I I M : 1 f I I 
orf61ng TVCEGTVKGVDGRGVLHLETAEGEQTWSGEISLRPDNRSVSVPKRPDSERFLLLEGGNS 211 



orf 61. pep 
orf 61ng 
orf 61. pep 
orf 61ng 
orf 61. pep 
orf 61ng 



RLKWAWVENGTFATVGSAPYRDLSPLGAEWAEKADGNVRIVGCAVCGEFKKAQVQEQLAR 90 
M I I I I I I I I 1 I I I I I I M I I I I I I I I I I I I I I || I I I I I I I I I II I I 11111:11111 

RLKWAWVENGT FAT VGS APYRDLS PLGAEWAEKADGNVRI VGCAVCGESKKAQVKEQLAR 271 

KIEWLPSSAQAXGIRNHYRHPEEHGSDRWFNALGSRRFSRNACVWSCGTAVTVDALTDD 150 
K I I M I I I I I I M I I I I I I I i I I I M I I I I I I I I I I i I I I I I I i I I I II I I I I I M I I I 

KIEWLPSSAQALGIRNHYRHPEEHGSDRWFNALGSRRFSRNACVWSCGTAVTVDALTDD 331 

GHYLGXGTIMPGFHLMKESLAVRTANLNRHAGKRYPFPT 189 
Mill I I I 1 I M I 1 t I I t I ! I ! I i II I I IMMMIt 

GHYLG-GTIMPGFHLMKESLAVRTANLNRPAGKRYPFPTTTGNAVASGMMDAVCGSIMMM 390 
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An ORF61ng nucleotide sequence <SEQ ID 237> was predicted to encode a protein having amino 
acid sequence <SEQ ID 238>: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 



WFSFGWAFDR 
KLGGILIETV 
ADAAVLLETL 
ETVCEGTVKG 
ERFLLLEGGN 
IVGCAVCGES 
FNALGSRRFS 
AVRTANLNRP 
GKPVDVIITG 
ESEHA* 



PQYELGSLSP VAALACRRAL 



RAGGKTVAW 
LAELGAVLEQ 
VDGRGVLHLE 
SRLKWAWVEN 
KKAQVKEQLA 
RNACWVSCG 
AGKRYPFPTT 
GGAAKVAEAL 



GIGINFVLPK 
YAEEGFAPFL 
TAEGEQTWS 
GTFATVGSAP 
RKIEWLPSSA 
TAVTVDALTD 
TGNAVASGMM 
PPAFLAENTV 



GCLGLETQIK 
EVENAASVQS 
NEYETANRDH 
GEISLRPDNR 
YRDLSPLGAE 
QALGIRNHYR 
DGHYLGGTIM 
DAVCGSIMMM 
RVADNLVIHG 



WPNDLWGRD 
LFQTASRRGN 
GKAVLLLRDG 
SVSVPKRPDS 
WAEKADGNVR 
HPEEHGSDRW 
PGFHLMKESL 
HGRLKEKNGA 
LLNLIAAEGG 



Further analysis revealed the complete gonococcal DNA sequence <SEQ ID 239> to be: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 



ATGACGGTTT 
CGGTTTGCCG 
CGCAGCAGCT 
CTGTTGCGCC 
TTTCGATGCC 
CGGCATTGAA 
GCGCGGATTG 
GCAAAGTAAG 
GCGAGTGCCT 
GAGTTGGGTT 
GGGGTGTTTG 
TCGGACGCGA 
GGTAAAACGG 
GGAAGTGGAA 
GGCGGGGCAA 
CTGGGCGCGG 
AAATGAGTAT 
TGCGCGACGG 
CGAGGCGTTC 
cggcgaaaTC 
ggccggatTC 
AAGTGGGCGT 
gtaCCGCGAT 
GAAATGTCCG 
CAAGTGAAGG 
ACAGGCTTTG 
CCGACCGTTG 
TGCGTCGTCG 
TGACGGACAT 
AAGAATCGCT 
CGTTACCCTT 
GGACGCGGTT 
AAAACGGCGC 
GCGAAAGTCG 
GCGCGTGGCG 
CCGAAGGCGG 



TGAAGCCTTC 
CAACACGTAT 
CAACGGTTTT 
AACACGACGG 
GAAGGTTTGC 
GCACGAGTGC 
CGCCGGACAA 
GGCAGGGGGC 
GATGTTCAGT 
CGCTGTCGCC 
GGTTTGGAAA 
CAAATTGGGC 
TTGCCGTGGT 
AACGCCGCTT 
TGCCGATGCC 
TGTTGGAACA 
GAAACGGCCA 
CGAAACCGTG 
TGCACTTGGA 
AGcctGCggc 
GgaacgtTTT 
GggtggAAAa 
TTGTCGCCTT 
CATCGTCGGT 
AACAGCTCGC 
GGCATACGCA 
GTTCAACGCC 
TCAGTTGCGG 
TATCTCGGCG 
CGCCGTCCGA 
TCCCGACCAC 
TGCGGCTCGA 
GGGCAAGCCT 
CCGAAGCCCT 
GACAACCTCG 
GGAATCGGAA 



GCATTGGCGG 
CGCAATTGGC 
TGGCAGCAGA 
CTATTGGCGG 
GCGATCTGGG 
GCGTCCAGCA 
GGCGCACAAA 
GGCAGGGGCG 
TTCGGCTGGG 
TGTTGCGGCA 
CGCAAATCAA 
GGCATTCTGA 
CGGTATCGGC 
CCGTGCAGTC 
GCCGTATTGC 
ATATGCGGAA 
ACCGCGACCA 
TGCGAAGGCA 
AACGGCAgaa 
ccgacaacaG 
tTGCtgttgg 
cggcacgttc 
TGGGCGCGGA 
TGCGCCGTGT 
CCGAAAAATC 
ACCACTACCG 
TTGGGCAGCC 
CACGGCGGTA 
GAACCATCAT 
ACCGCCAACC 
AACGGGCAAC 
TAATGATGAT 
GTCGATGTCA 
GCCGCCTGCA 
TCATCCACGG 
CACGCTTAA 



GTGTTGGCGG 
GCGTGAGGCG 
TGCCGGCGCA 
CTGGTGCGCC 
GGAAAGGTCG 
ACGACGAGAT 
ACCATATGCG 
GAAGTGGTCG 
CGTTTGACCG 
CTTGCGTGCC 
GTGGCCAAAC 
TTGAAACAGT 
ATCAATTTCG 
GCTGTTTCAG 
TGGAAACATT 
GAAGGGTTCG 
CGGCAAGGCG 
CGGTTAAAGG 
ggcgaACAGa 
GTCGGtttcc 
aaggcgggaa 
gcaaccgtgg 
GTGGGCGGAA 
GCGGAGAATC 
GAGTGGCTGC 
CCACCCCGAA 
GCCGCTTCAG 
ACGGTTGACG 
GCCCGGCTTC 
TCAACCGCCC 
GCCGTCGCAA 
GCACGGCCGT 
TCATTACCGG 
TTTTTGGCGG 
GCTGCTGAAC 



AGCTTGCCGA 
GACATGAAGC 
TATACGCGGG 
CCTTGGCGGT 
GGTTTTCAGA 
ACTGGAATTG 
TGACCCACCT 
CACCGTTTGG 
GCCGCAGTAT 
GGCGCGCTTT 
GATTTGGTCG 
CAGGGCGGGC 
TGCTGCCCAA 
ACGGCATCGC 
GCTTGCGGAA 
CGCCATTTTT 
GTATTGCTGT 
CGTGGACGGA 
cggtcgtcag 
gtgccgaagc 
cagccgGCTC 
gcagcgcgCc 
AAGGCGGATG 
CAAAAAGGCA 
CGTCTTCCGC 
GAACACGGTT 
CCGCAACGCC 
CGCTCACCGA 
CACCTGATGA 
CGCCGGCAAA 
GCGGCATGAT 
TTGAAAGAAA 
CGGCGGCGCG 
AAAATACCGT 
CTGATTGCCG 



This corresponds to the amino acid sequence <SEQ ID 240; ORF61ng-l>: 



1 MTVLKPSHWR 

51 LLRQHDGYWR 

101 ARIAPDKAHK 

151 ELGSLSPVAA 

201 GKTVAWGIG 

251 LGAVLEQYAE 

301 RGVLHLETAE 

351 KWAWVENGTF 

401 QVKEQLARKI 

451 CVWSCGTAV 

501 RYPFPTTTGN 

551 AKVAEALPPA 



VLAELADGLP 
LVRPLAVFDA 
TICVTHLQSK 
LACRRALGCL 
INFVLPKEVE 
EGFAPFLNEY 
GEQTWSGEI 
ATVGSAPYRD 
EWLPSSAQAL 
TVDALTDDGH 



AVASGMMDAV 
FLAENTVRVA 



QHVSQLAREA 
EGLRDLGERS 
GRGRQGRKWS 
GLETQIKWPN 
NAASVQSLFQ 
ETANRDHGKA 
SLRPDNRSVS 
LSPLGAEWAE 
GIRNHYRHPE 
YLGGTIMPGF 
CGSIMMMHGR 
DNLVIHGLLN 



DMKPQQLNGF 
GFQTALKHEC 
HRLGECLMFS 
DLWGRDKLG 
TASRRGNADA 
VLLLRDGETV 
VPKRPDSERF 
KADGNVRIVG 
EHGSDRWFNA 
HLMKESLAVR 
LKEKNGAGKP 
LIAAEGGESE 



WQQMPAHIRG 
ASSNDEILEL 
FGWAFDRPQY 
GILIETVRAG 
AVLLETLLAE 
CEGTVKGVDG 
LLLEGGNSRL 
CAVCGESKKA 
LGSRRFSRNA 
TANLNRPAGK 
VDVIITGGGA 
HA*. 



ORF61ng-l and ORF61-1 show 93.9% identity in 591 aa overlap: 
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or f 61ng- 1 . pep MTVLKPSHWRVIAELADGLPQHVSQLAREADMKPQQLNGFWQQMPAHIRGLLRQHDGYWR 60 

It I I 1 I I I ! I i I 1 I 1 I I I t I I t I I i I I I t 1 1 I I I I I I i t I 1 1 I 1 I t I I I I I I J 

or f 6 1 - 1 MTVLKLSHWRVLAELADGLPQHVSQLARMADMKPQQ^ 60 

orf 61nq-l .pep LVRPLAVFDAEGLRDLGERSGFQTALKHECASSNDEILELARIAPDKAHKTICVTHLQSK 120 

Mlllllll lllll:MIM I II I I III I! Mill III Ml Ml III I I MM I M II I I 
or f 6 1 - 1 LVRPIAVFDAEGLRELGERSGFQTALKHECASSNDEILELARIAPDKAHKTICVTHLQSK 120 

orf 61ng-l .pep GRGRQGRKWSHRLGECLMFSFGWAFDRPQYELGSLSPVAALACRRALGCLGLETQIKWPN 180 

I | M M I I I I I I 1 1 I M I M I I ! : ! I M M I M M I I I I 1 : 1 I I M I : II I : : M M I I 

or f 6 1- 1 GRGRQGRKWSHRLGECLMFSFGWVFDRPQYELGSLSPVAAVACRRALSRLGLDVQIKWPN 180 

orf 61ng-l .pep DLWGRDKLGGI LIETVRAGGKTVAWG IGINFVLPKEVENAASVQSLFQTASRRGNADA 240 

MIIMIIIMMIIMMI MM MUM Mill Ml III MMII I I llllllllli I 
orf 61-1 DLWGRDKLGGILIETVRTGGKTVAWGIGINFVLPKEVENAASVQSLFQTASRRGNADA 240 

orf 61ng-l .pep AVLLETLLAELGAVLEQYAEEGFAPFLNEYETANRDHGKAVLLLRDGETVCEGTVKGVDG 300 

M II I II M II Ml lll::IMII: I I : : I M I M M M M I I I I M Mlllllll 
orf 61-1 AVLLETLLVELDAVLLQYARDG FAP FVAEYQAANRDHGKAVLLLRDGETVFEGTVKGVDG 300 

orf 61ng-l .pep RGVLHLETAEGEQTWSGEISLRPDNRSVSVPKRPDSERFLLLEGGNSRLKWAWVENGTF 360 

:MIMIIMI:IMMMIIM Ml MMM M II M M : I I I M I M II I I II I I 
orf 61-1 QGVLHLETAEGKQT WSGE I SLRS DDRPVS V PKRRDSERFLLLDGGNSRLKWAWVENGT F 3 60 

orf 61ng-l .pep AWGSAPYRDLS PLGAEWAEKADGNVRI VGCAVCGESKKAQVKEQLARKI EWLPS S AQAL 420 

II I I i I I M 1 i II I I II I I M II I I I I I I I I! I I I I f I I t t : I I I I I I 1 I t I I I I I I 1 1 
orf 61-1 ATVGS APYRDLS PLGAEWAEKADGNVRI VGCAVCGEFKKAQVQEQLARKI EWLPSSAQAL 420 

or f 61ng-l . pep GIRNHYRHPEEHGSDRWFNALGSRRFSRNACVWSCGTAVTVDALTDDGHYLGGTIMPGF 480 

M II M I M I I II M MM M M M III I M M I MM M I I I I I I I I I i I I i 1 1 1 1 I I I 
orf 61-1 GIRNHYRHPEEHGSDRWFNALGSRRFSRNACVWSCGTAVTVDALTDDGHYLGGTIMPGF 480 

orf 61ng-l .pep HLMKE SLAVRTANLNRPAGKRYPFPTTTGNAVASGMMDAVCG S IMMMHGRLKEKNGAGKP 540 

M I I I I I I I II I I I I M I I I I I I I II I I I I I I I I I I I : I I I I I i I I M : I II I 1 

orf61-l HMBCESI^VRTANLNRHAGKRYPFPTTTGNAVASGMMDAVCGSVMMMHGRLKEKTGAGKP 540 

orf 61ng-l .pep VDVIITGGGAAKVAEALPPAFLAENTVRVADNLVIHGLLNLIAAEGGESEHAX 593 

I 1 I i 1 1 1 I t I I 1 I I I I r I 1 I i i I I I 1 I 1 I I I I I E t : 1 I I 1 r | | I I } I || 
orf 61-1 VDVIITGGGAAKVAEALPPAFLAENTVRVADNLVIYGLLNMIAAEGREYEHIX 593 

Based on tins analysis, including the homology with the baf protein of B.pertussis and the presence 
of a putative prokaryotic membrane lipoprotein lipid attachment site, it is predicted that these 
proteins from N.meningitidis and Kgonorrhoeae, and their epitopes, could be useful antigens for 
vaccines or diagnostics, or for raising antibodies. 



Example 29 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 241>: 

1 ATGTTTTACC AAATCCTTGC CCTGATTATC TGGAGCAGCT CGTTTATTGC 

51 CGCCAAATAT GTCTATGGCG GCATCGATCC CGCATTGATG GTCGGCGTGC 

101 GCCTGCTAAT TGCCGCGCTG CCTGCACTGC CCGCCTGCCG CCGTCATGTC 

151 GGCAAGATTC CGCGTGAGGA ATGGAAGCCG TTGCTGATTG TGTCGTTCGT 

201 CAACTATGTG CTGACCCTGC TGCTTCAGTT TGTCGGGTTG AAATACACTT 

251 CCGCCGCCAG CGCATCGGTC ATTGTCGGAC TCGAGCCGCT GCTGATGGTG 

301 TTTGTCGGAC ACTTTTTCTT CAACGACAAA GCGCGTGCCT ACCACTGGAT 

351 ATGCGGCGCG GCGGCATTTG CCGGTGTCGC GCTGCTGATG GCGGGCGGTG 

401 CGGaAGAGGG CGGCGaAGTC GGCTGGTTCG GCTGCCTGCT GGTGTTGTTG 

451 GCGGGCGCGG GCTTTTGTGC CGCTATGCGT CCGACGCAAA GGCTGATTGC 

501 ACGCATCGGC GCACCGGCAT TCACATCTGT TTCCATTGCC GCCGCATCGT 

551 TGATGTGCCT GCCGTTTTCG CTTGCTTTGG CGCAAAGTTA TACCGTGGAC 

601 TGGAGCGTCG GGATGGTATT GTCGCTGCTG TATTTGGGTT TGGGGTGC. . 



This corresponds to the amino acid sequence <SEQ ID 242; ORF62>: 



WO 99/24578 



-177- 



PCT/IB98/01665 



1 MFYQILALII WSSSFIAAKY VYGGIDPALM VGVRLLIAAL PALPACRRHV 

51 GKIPREEWKP LLIVSFVNYV LTLLLQFVGL KYTSAASASV IVGLEPLLMV 

101 FVGHFFFNDK ARAYHWICGA AAFAGVALLM AGGAEEGGEV GWFGCLLVLL 

151 AGAGFCAAMR . PTQRLIARIG APAFTSVSIA AASLMCLPFS LALAQSYTVD 

201 WSVGMVLSLL YLGLGC. - 

Further work revealed the complete nucleotide sequence <SEQ ID 243>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



ATGTTTTACC 
CGCCAAATAT 
GCCTGCTAAT 
GGCAAGATTC 
CAACTATGTG 
CCGCCGCCAG 
TTTGTCGGAC 
ATGCGGCGCG 
CGGAAGAGGG 
GCGGGCGCGG 
ACGCATCGGC 
TGATGTGCCT 
TGGAGCGTCG 
CTGGTACGCC 
ATGTTTCGGG 
GCGGTTTTGA 
GTTTGTCGTC 
AATAA 



AAATCCTTGC 
GTCTATGGCG 
TGCCGCGCTG 
CGCGTGAGGA 
CTGACCCTGC 
CGCATCGGTC 
ACTTTTTCTT 
GCGGCATTTG 
CGGCGAAGTC 
GCTTTTGTGC 
GCACCGGCAT 
GCCGTTTTCG 
GGATGGTATT 
TATTGGCTGT 
ACTGTTGATT 
TTTTGGGCGA 
ATCGCCGCCA 



CCTGATTATC 
GCATCGATCC 
CCTGCACTGC 
ATGGAAGCCG 
TGCTTCAGTT 
ATTGTCGGAC 
CAACGACAAA 
CCGGTGTCGC 
GGCTGGTTCG 
CGCTATGCGT 
TCACATCTGT 
CTTGCTTTGG 
GTCGCTGCTG 
GGAACAAGGG 
TCGCTCGAAC 
ACACCTGTCG 
CCTTGGTTGC 



TGGAGCAGCT 
CGCATTGATG 
CCGCCTGCCG 
TTGCTGATTG 
TGTCGGGTTG 
TCGAGCCGCT 
GCGCGTGCCT 
GCTGCTGATG 
GCTGCCTGCT 
CCGACGCAAA 
TTCCATTGCC 
CGCAAAGTTA 
TATTTGGGTT 
GATGAGCCGT 
CCGTCGTCGG 
CCCGTGTCCG 
CGGCCGGCTG 



CGTTTATTGC 
GTCGGCGTGC 
CCGTCATGTC 
TGTCGTTCGT 
AAATACACTT 
GCTGATGGTG 
ACCACTGGAT 
GCGGGCGGTG 
GGTGTTGTTG 
GGCTGATTGC 
GCCGCATCGT 
TACCGTGGAC 
TGGGGTGCGG 
GTTCCTGCCA 
CGTGCTGCTG 
CCTTGGGCGT 
TCGCATCAAA 



This corresponds to the amino acid sequence <SEQ ID 244; ORF62-l>: 



l 

51 
101 
151 
201 
251 



MFYQILALII WSSSFIAA KY VYGGI DPALM VGVRLLIAAL PAL PACRRHV 
GKIPREEWKP L LIVSFVNYV LTLLLQFV GL KYTSA ASASV IVGLEPLLMV 
FVGHFFFNDK ARAYHW ICGA AAFAGVALLM AGG AEEGGEV GW FGCLLVLL 
AGAGFCAAMR PTQRLIARIG APAFTS VSIA AASLMCLPFS LALA QSYTVD 
WSVGMVLSLL YLGLGCGWYA YWLWNKGMSR VPANVSG LLI SLEPWGVLL 
AVLI LGEHLS P VSALGVFW IAATLVAG RL SHQK* 



Computer analysis of this amino acid sequence gave the following results: 

Homology with hypothetical transmembrane protein HI0976 of H. influenzae (accession number 057147) 
ORF62 and HI0976 show 50% aa identity in 1 14aa overlap: 



Orf62 



HI0976 



0rf62 



1 MFYQILALIIWSSSFIAAKYVYGGIDPALMVGVRXXXXXXXXXXXCRRHVGKIPREEWKP 60 

M YQILAL+IWSSS IKY +DP L+V VR R KI + K 

1 MLYQILALLIWSSSLIVGKLTYSMMDPVLWQVRLIIAMIIVMPLFLRRWKKIDKPMRKQ 60 



61 LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAY 114 
L ++F NY LLQF+GLKYTSA+SA ++GLEPLL+VFVGHFFF K + 
HI0976 61 LWWLAFFNYTAVFLLQFIGLKYTSASSAVTMIGLEPLLWFVGHFFFKTKQNGF 114 

Homology with a predicted ORF from Mmeninzitidis (strain A) 

ORF62 shows 99.5% identity over a 216aa overlap with an ORF (ORF62a) from strain A of//. 
meningitidis: 



orf 62. pep 
orf 62a 

orf 62. pep 
orf 62a 

orf 62. pep 



10 20 30 40 50 60 

MFYQILALIIWSSSFIAA KYVYGGID PAmVGVRLLIAALPAL PACRRHVGKIPREEWKP 
1 I I I 1 1 1 1 I I I 1 I 1 I 1 1 1 1 1 I 1 1 I ff I I I I 1 1 I 1 ! I I I I I I I I i I I 1 I 1 I I I I I I J I t I I I 
MFYQILALIIWSSSFIAA KYVYGGID PALMVGVRLLIAALPAL PACRRHVGKI PREEWKP 

10 20 30 40 50 60 

70 80 90 100 110 120 

LLIVSFVNYVLTLLLQFVG LKYTS AASASVIVGLEPLLMVFV GHFFFNDKARAYH WICGA 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I | | I I | | | | 
L LIVSFVNYVLTLLLQFV GLKYTS AASASVIVGLEPLLMVFVG HFFFNDKARAYHWICGA 

70 . 



80 



90 



100 



110 



120 



130 140 150 160 170 180 

AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRLIARIGAPAFTSVS I A 



WO 99/24578 



-178- 



PCT/IB98/01665 



I I I I I 1 1 I I I 1 1 I I I I I I I I I I I I I I I I I I 1 1 1 I I I I I I I I I I I I I I I I I I I I I I 1 1 1 1 I 
orf62a AAFAGVALLMAGG AEEGGEVG WFGCLLVLLAGAGFGAAM RPTQRLI ARI GAPAFTS VS I A 

130 140 150 160 170 180 

190 200 210 

orf 62 . pep AASLMCLPFSLALA QSYTVDWSVGMVLSLLYLGLGC 
I I I I i I I I II I I I I I I I I I I i I I I I I ! I I I I i I : I I 
orf 62a AASLMCLPFSLAL AQSYTVDWSVGMVLSLLYLGVGCSWYAYWLWNKGMSRVPANVSGLLI 

190 200 210 220 230 240 

orf 62a SLEPWGVLLAVLI LGEHLSPVSVLGVFWIAATLVAGRLSHQKX 
250 260 270 280 



The complete length ORF62a nucleotide sequence <SEQ ID 245> is: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



ATGTTTTACC 
CGCCAAATAT 
GCCTGCTGAT 
GGCAAGATTC 
CAACTATGTG 
CCGCCGCCAG 
TTTGTCGGAC 
ATGCGGCGCG 
CGGAAGAGGG 
GCGGGCGCGG 
ACGCATCGGC 
TGATGTGCCT 
TGGAGCGTCG 
CTGGTACGCC 
ACGTTTCGGG 
GCGGTTTTGA 
GTTTGTCGTC 
AATAA 



AAATCCTTGC 
GTCTATGGCG 
TGCTGCGCTG 
CGCGTGAGGA 
CTGACCCTGC 
CGCATCGGTC 
ACTTTTTCTT 
GCGGCATTTG 
CGGCGAAGTC 
GCTTTTGTGC 
GCACCGGCAT 
GCCGTTTTCG 
GAATGGTATT 
TATTGGCTGT 
ACTGTTGATT 
TTTTGGGCGA 
ATCGCCGCCA 



CCTGATTATC 
GCATCGATCC 
CCTGCACTGC 
ATGGAAGCCG 
TACTTCAGTT 
ATTGTCGGAC 
CAACGACAAA 
CCGGTGTCGC 
GGCTGGTTCG 
CGCTATGCGT 
TCACATCTGT 
CTTGCTTTGG 
GTCGCTGCTG 
GGAACAAGGG 
TCGCTCGAAC 
ACACCTGTCG 
CCTTGGTTGC 



TGGAGCAGCT 
CGCATTGATG 
CCGCCTGCCG 
TTGCTGATTG 
TGTCGGGTTG 
TCGAGCCACT 
GCGCGTGCCT 
GCTGCTGATG 
GCTGCCTGCT 
CCGACGCAAA 
TTCCATTGCC 
CGCAAAGTTA 
TATTTGGGCG 
GATGAGCCGT 
CCGTCGTCGG 
CCCGTGTCCG 
CGGCCGGCTG 



CGTTTATTGC 
GTCGGCGTGC 
CCGTCATGTC 
TGTCGTTCGT 
AAATACACTT 
GCTGATGGTG 
ACCACTGGAT 
GCGGGCGGTG 
GGTGTTGTTG 
GGCTGATTGC 
GCCGCATCGT 
TACCGTGGAC 
TGGGGTGCAG 
GTTCCTGCCA 
CGTGCTGCTG 
TCTTGGGCGT 
TCGCATCAAA 



This encodes a protein having amino acid sequence <SEQ ID 246>: 



1 MFYQILALII WSSSFIAA KY VYGGID PALM VGVRLLIAAL PAL PACRRHV 

51 GKIPREEWKP 

101 FVGHFFFNDK 

151 AGAGFCAAMR 

201 WSVGMVLSLL 

251 AVLILGEHLS 



L LIVSFVNYV LTLLLQFV GL KYTS AASASV IVGLEPLLMV 
ARAYHW ICGA AAFAGVALLM AGGA EEGGEV GWFGCLLVLL 
PTQRLIARIG APAFTS VSIA AASLMCLPFS LALA QSYTVD 
YLGVGCSWYA YWLWNKGMSR VPANVSG LLI SLEPWGVLL 
PVSVLGVFW IAATLVAGRL SHQK* 



ORF62a and ORF62-1 show 98.9% identity in 284 aa overlap: 



orf 62a. pep 
orf62-l 
orf 62a. pep 
orf 62-1 
orf 62a. pep 
orf62-l 
orf 62a. pep 
orf 62-1 
orf 62a. pep 
orf62-l 



MFYQILALI IWSSSFIAAKYVYGGIDPALMVGVRLLIAALPALPACRRHVGKI PREEWKP 60 
I I I I I I I I I I I I I II I I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 
MFYQI LALI IWS S S FI AAKYVYGGI DPALMVGVRLLI AALPALPACRRHVGKI PREEWKP 60 

LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAYHWICGA 120 
I I 1 I 1 1 1 J 1 ! ! I ! I M 1 1 E 1 ! ! I 1 1 I I I I I I M ! ! ! I 1 1 I 1 M I I I I 1 I I 1 I i I M I I M 
LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAYHWICGA 120 

AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAG FCAAMRPTQRLI ARIGAPAFT S VS I A 180 

I II I I I I I I I I I I I I I I I I I I I I I II II I II I II I I I I i II I I I I I I I I I I I I I I I II I I 
AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAG FCAAMRPTQRLI ARIGAPAFT S VS I A 180 

AASLMCLPFSLALAQSYTVDWSVGMVLSLLYLGVGCSWYAYWLWNKGMSRVPANVSGLLI 240 

II II I I I I I I II I I I I I II II I I I I I I II I II I: I I: I I I I I I I I I I I I I II II I I I I I I 
AASLMCLPFS LALAQS YTVDWSVGMVLSLLYLGLGCGWYAYWLWNKGMSRVPANVSGLLI 240 

S LEPWGVLLAVLI LGEHLS PVS VLGVFWIAATLVAGRLSHQKX 285 

IMIIMMII llll MMilllilll IIIIMMMIM III II 

S LE P WGVLLAVLI LGEHLS PVSALGVFW IAATLVAGRLS HQKX 285 



Homology with a predicted ORF from N gonorrhoeae 

ORF62 shows 99.5% identity over a 216aa overlap with a predicted ORF (ORF62.ng) from N. 
gonorrhoeae: 



WO 99/24578 



PCT/1B98/01665 



-179- 



10 



15 



or f 62. pep 
orf 62ng 
orf 62 .pep 
orf 62ng 
orf 62. pep 
orf62ng 
orf 62. pep 
orf 62ng 



MFYQI LALI I WSSS FI AAKYVYGGI DPALMVGVRLLI AALPALPACRRHVGKI PREEWKP 60 
I I I I I I I I M I : I I I M I I I I I I I t I I I I I I I I I I I I I I I I I I I I I t I I I I I I I I I 1 II I 

MFYQI IALI I WGSSFIAAKYVYGG I DPALMVGVRLLIAALPALPACRRHVGKI PREEWKP 60 

LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAYHWICGA 120 
I I I II I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I | | | I I I | | 

LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLIJJ4VFVGHFFFNDKARAYHWICGA 120 

AAFAGVALLMAGGAEEGGE VGWFGCLLVLLAGAG FCAAMRPTQRLI ARI GAPAFTS VS IA 180 
MillllllMlilMMillllliillllllilllllliiMIIIIMIIIimilll 

AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRLIARIGAPAFTSVSIA 180 



AASLMCLPFSLALAQSYTVDWSVGMVLSLLYLGLGC 

M I I 1 11 II I llll I II M 1 1 III 1 1 Ml I II II || 

AASLMCLPFSLALAQSYTVDWSVGMVLSLLYLGLGCGWYAYWLWNKGMSRVPANASGLLI 



216 
240 



The complete length ORF62ng nucleotide sequence <SEQ ID 247> is: 



20 



25 



30 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



ATGTTTtACC 
CGCCAAATAT 
GCCTGCTGAT 
GGCAAGATTC 
CAACTATGTG 
CCGCCGCCAG 
TTTGTCGGAC 
ATGCGGCGCG 
CGGAAGAGGG 
GCGGGCGCGG 
CCGCATCGGC 
TGATGTGCCT 
TGGAGCGTCG 
CTGGTACGCC 
ACGCGTCGGG 
GCGGTTTTGA 
GTTTGTCGTC 
ACGCGCAAAA 



AAATCCTTGC 
GTCTATGGCG 
TGCCGCGCTG 
CGCGTGAGGA 
CTGACCCTGC 
CGCATCGGTC 
ACTTTTTCTT 
GCGGCATTTG 
CGGCGAAGTC 
GCTTTTGTGC 
GCACCGGCAT 
GCCGTTTTCG 
GGATGGTATT 
TATTGGCTGT 
ACTGTTGATT 
TTTTGGGCGA 
ATCGCCGCCA 
CGGCAATGCC 



CCTGATTATC 
GCATCGATCC 
CCTGCACTGC 
ATGGAAGCCG 
TGCTTCAGTT 
ATTGTCGGAC 
CAACGACAAA 
CCGGTGTCGC 
GGCTGGTTCG 
CGCTATGCGT 
TCACATCTGT 
CTTGCTTTGG 
GTCGCTGTTG 
GGAACAAGGG 
TCGCTCGAAC 
ACATTTATCG 
CTTTCGCCGC 
GTCTGA 



TGGGGCAGCT 
CGCATTGATG 
CCGCCTGCCG 
TTGCTGATTG 
TGTCGGGTTG 
TCGAGCCGCT 
GCGCGTGCCT 
GCTGCTGATG 
GCTGCCTGCT 
CCGACGCAAA 
TTCCATTGCC 
CGCAAAGTTA 
TATTTGGGTT 
GATGAGCCGT 
CCGTCGTCGG 
CCCGTGTCCG 
CGGCCGGCTG 



CGTTTATTGC 
GTCGGCGTGC 
CCGTCATGTC 
TGTCGTTCGT 
AAATACACTT 
GCTGATGGTG 
ACCACTGGAT 
GCGGGCGGTG 
GGTGTTGTTG 
GGCTGATTGC 
GCCGCATCGT 
TACCGTGGAC 
TGGGGTGCGG 
GTTCCTGCCA 
CGTGCTGTTG 
CCTTGGGCGT 
TCGCGCAGGG 



35 This encodes a protein having amino acid sequence <SEQ ID 248>; 



40 



1 MFYOILALII WGSSFIA AKY VYGGID PALM VGVRLLIAAL PAL PACRRHV 

51 GKI PREEWKP 

101 FVGHFFFNDK 

151 AGAGFCAAM R 

201 WSVGMVLSLL _ 

251 AVLI LGEHLS P VSALGVFW IAATFAAG RL SRRDAQNGNA V* 



L LIVSFVNYV LTLLLQFV GL KYTSA ASASV IVGLEPLLMV 
ARAYHW ICGA AAFAGVALLM AGG AEEGGEV GW FGCLLVLL 
PTQRLIARIG APAFTS VSIA AASLMCLPFS LAL AQSYTVD 
YLGLGCGWYA YWLWNKGMSR VPANASGLLI SLEPWGVLL 



ORF62ng and ORF62-1 show 97.9% identity in 283 aa overlap: 



45 



50 



55 



60 



65 



10 20 30 40 50 60 

orf62ng.pep MFYQILALIIWGSSFIAAKYVYGGIDPAUWGVRLLIAALPALPACRRHVGKIPREEWKP 
I I I li Ml Ml: I I Ml II I I I I I Ml I II llll I I I I I I I I I I I II I I I I I I Mill I I 
orf 62-1 MFYQIIALIIWSSSFIAAKYVYGGIDPALMVGVRLLIAALPALPACRRHVGKIPREEWKP 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 62ng . pep LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAYHWICGA 
M M I I I i I M 11 1 1 M I f M M I 1 I 1 11 I 1 1 1 1 1 1 I II I t M I I f M I 1 1 I I M M M 1 
orf 62-1 LLIVS FVNYVLTLLLQFVGLKYTSAASASVIVGLE PLLMVFVGHFFFNDKARAYHWICGA 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 62ng - pep AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRLIARIGAPAFTSVSIA 
I M MMMII! Mill MMIIIIIII MM Mill IIMIIIIMII MIMMIMI 
orf 62-1 AAFAGVALLMAGGAEEGGEVGW FGCLLVLLAGAGFCAAMR PTQRL I ARIGAPAFTSVS I A 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 62ng . pep AASLMCLPFSLALAQSYTVDWSVGMVLSLLYLGLGCGWYAYWLWNKGMSRVPANASGLLI 
Ml IMMMIMMMIMIMMMMMMIMI MMIIIMMIIIMIMIMI 
orf 62-1 AASLMCLPFSLALAQSYTVDWSVGMVLSLLYLGLGCGWYAYWLWNKGMSRVPANVSGLLI 

190 200 210 220 230 240 



WO 99/24578 



-180- 



PCT/IB98/01665 



250 260 270 280 290 

orf62ng.pep SLE PWGVLLAVL I LGEHLS PVSALGVFWI AAT FAAGRLSRRDAQNGNAVX 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : : I I I I I :: 
orf62-l SLEPWGVLLAVLILGEHLSPVSALGVFWIAATLVAGRLSHQECX 

250 260 270 280 

Furthermore, ORF62ng shows significant homology to a hypothetical H.influenzae protein: 

sp|Q57147 |Y976_HAEIN HYPOTHETICAL PROTEIN HI0976 >gi 1 1074589 |pir I IB64163 
hypothetical protein HI0976 - Haemophilus influenzae (strain Rd KW20) 
>gi 1 1574004 (U32778) hypothetical [Haemophilus influenzae] Length - 128 

Score « 106 bits (262), Expect « 2e-22 

Identities - 56/114 (49%), Positives - 68/114 (59%) 

Query: 1 MFYQILALIIWGSSFIAAKYVYGGIDPAmVGVRXXXXXXXXXXXCRRHVGKIPREEWKP 60 

M YQILAL+IW SS I K Y +DP L+V VR R KI + K 

Sbjct: 1 MLYQILALLIWSSSLIVGKLTYSMMDPVLWQVRLIIAMIIVMPLFLRRWKKIDKPMRKQ 60 

Query: 61 LLIVSFVNYVLTLLLQEVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAY 114 

L ++F NY LLQF+GLKYTSA+SA ++GLEPLL+VFVGHFFF K + 
Sbjct: 61 LWWLAFFNYTAVFLLQFIGLKYTSASSAVTMIGLEPLLWFVGHFFFKTKQNGF 114 



Based on this analysis, including the homology with the transmembrane protein of H.influenzae 
and the putative leader sequecne and several transmembrane domains in the gonococcal protein, 
it is predicted that these proteins from Kmeningitidis and N. gonorrhoeae, and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 30 



The following partial DNA sequence was identified in Kmeningitidis <SEQ ED 249>: 

1 ATGCGCCGTT TTCTACCGAT CGCAGCCATA TGCGCmGwms TCCTGkkGTA 

51 sGGACTGACG GCGGCAACCG GCAGCACCAG TTCGCTGGCG GATTATTTCT 

101 GGTGGATTGT TGCGTTCAGC GCAATGCTGC TGCTGGTGTT GTCCGCCGTT 

151 TTGGCACGTT ATGTCATATT GCTGTTGAAA GACAGGCGCG ACGGCGTATT 

201 CGGTTCGgtA srTyGCCAAA gsGCCTgkks TGGG.ATGTT TACGCTGGTT 

251 GCCGkACTGC CCGGCGTGTT TCTGTTCGGC TTTCCCGCAC AGTTCATCAA 

301 CGGCACGATT AATTCGTGGT TCGGCAACGA TACCCACGAG GCGCTTGAAC 

351 GCAGCCTCAA TTTGAGCAAG TCCGCATTGA ATTTGGCGGC AGACAACGCC 

401 CTCGGCAACG CCGTCCCCGT GCAGATAGAC CTCATCGGCG CGGCTTCCCT 

451 GCCCGGGGAT ATGGGCAGGG TGCTGGAACA TTACGCCGGC AGCGGTTTTG 

501 CCCAGCTTGC CCTGTACAAy ksCGCAAGCG GCAAAATCGA AAAAAGCATC 

551 AACCCGCACA AGCTCGATCA GCCGTTTCCA GGTAAGGCGC GTTGGGAaAa 

601 AATCCaACGG GCGGGTTCGG TCAGGGATTT GGAAAGCATA GGCGGCGTAT 

651 TGTaCGCGCA GGGCTGGCTG TCGGCGGGTA CGCACwACGG GCGCGATTAC 

701 GCCTTGTTTT TCCGTCAGCC GGTTCCCAAA GGCGTGGCAG AGGATGCCGT 

751 yTTAATCGAA AAGGCAAGGG CGAAATATGC TGAGTTGAGT TACAGCAAAA 

801 AAGGTTTGCA GACCTTTTTC CTGGCAACCC TGCTGATTGC CTCGCTGCTG 

851 TCGATTTTTC TTGCACTGGT CATGGCACTG TATTTCGCCC GCCGTTTCGT 

901 CGAACCCGTC CTATCGCTTG CCGAGGGGGC GAAGGCGGTG GCGCAAGGCG 

951 ATTTCAGCCA GACGCGCCCC GTGTTGCGCA ACGACGAGTT CGGACGCTTG 

1001 ACCArGTTGT TCAACCACAT GACCGAGCAG CTTTCCATCG CCAAAGATGC 

1051 AGACGAGCGC AACCGCCGGC GCGAGGAAGC CGCCAGGCAT TATCTTGAAT 

1101 GCGTGTTGGA GGGGCTGACC ACGGGCGTGG TGGTGTTTGA CGAACAAGGC 

1151 TGTCTGAAAA CCTTCAACAA AGCGGCGGGT ACC. . 

This corresponds to the amino acid sequence <SEQ ID 250; ORF64>: 



1 MRRFLPIAAI CAXXLXXGLT AATGSTSSLA DYFWWIVAFS AMLLLVLSAV 

51 LARYVILLLK DRRDGVFGSX XAKXPXXXMF TLVAXLPGVF LFGFPAQFIN 

101 GTINSWFGND THEALERSLN LSKSALNLAA DNALGNAVPV QIDLIGAASL 

151 PGDMGRVLEH YAGSGFAQLA LYNXASGKIE KSINPHKLDQ PFPGKARWEK 

201 IQRAGSVRDL ESIGGVLYAQ GWLSAGTHXG RDYALFFRQP VPKGVAEDAV 

251 LIEKARAKYA ELSYSKKGLQ TFFLATLLIA SLLSIFLALV MALYFARRFV 



WO 99/24578 



-181- 
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301 EPVLSLAEGA KAVAQGDFSQ TRPVLRNDEF GRLTXLFNHM TEQLSIAKDA 
351 DERNRRREEA ARHYLECVLE GLTTGVWFD EQGCLKTFNK AAGT. . 

Further work revealed the complete nucleotide sequence <SEQ ID 25 1>: 



10 



15 



20 



25 



30 



35 



40 



45 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 



ATGCGCCGTT 
CGGACTGACG 
GGTGGATTGT 
TTGGCACGTT 
CGGTTCGCAG 
TACTGCCCGG 
ACGATTAATT 
CCTCAATTTG 
GCAACGCCGT 
GGGGATATGG 
GCTTGCCCTG 
CGCACAAGCT 
CAACGGGCGG 
CGCGCAGGGC 
TGTTTTTCCG 
ATCGAAAAGG 
TTTGCAGACC 
TTTTTCTTGC 
CCCGTCCTAT 
CAGCCAGACG 
AGTTGTTCAA 
GAGCGCAACC 
GTTGGAGGGG 
TGAAAACCTT 
CCCCTGTGGG 
GTCCCTGCTT 
ACAAACCGGT 
CTGGGCAAGG 
GGTGATTGAC 
GGGGCGAAGT 
CCCATCCAGC 
GGATGAGCAG 
AACAGGTGGC 
CGTTCCCCTT 
CGATGTGTTG 
TTGCCGGCGA 
GTGCTGCACA 
TGTGCCCGAA 
TCCTGACGGT 
AACGCCTTCG 
TCTGCCTGTG 
TGAGCAATCA 
ACGGTAAAAA 



TTCTACCGAT 
GCGGCAACCG 
TGCGTTCAGC 
ATGTCATATT 
ATTGCCAAAC 
CGTGTTTCTG 
CGTGGTTCGG 
AGCAAGTCCG 
CCCCGTGCAG 
GCAGGGTGCT 
TACAATGCCG 
CGATCAGCCG 
GTTCGGTCAG 
TGGCTGTCGG 
TCAGCCGGTT 
CAAGGGCGAA 
TTTTTCCTGG 
ACTGGTCATG 
CGCTTGCCGA 
CGCCCCGTGT 
CCACATGACC 
GCCGGCGCGA 
CTGACCACGG 
CAACAAAGCG 
GCAGCAGCCG 
GCCGAAGTGT 
CCATGTGAAA 
CAACCGTCCT 
GACATCACCG 
GGCGAAGCGG 
TTTCCGCCGA 
GATGCGCAAA 
GGCATTGAAG 
CGCTCAAATT 
GCATTGTATG 
ACCGCTGACG 
ATATTTTCAA 
GTCAGGGTAA 
TTGCGACAAC 
AGCCGTATGT 
GTGAAAAAAA 
GGATGCGGGT 
CTTATGCGTA 



CGCAGCCATA 
GCAGCACCAG 
GCAATGCTGC 
GCTGTTGAAA 
GCCTTTCTGG 
TTCGGCGTTT 
CAACGATACC 
CATTGAATTT 
ATAGACCTCA 
GGAACATTAC 
CAAGCGGCAA 
TTTCCAGGTA 
GGATTTGGAA 
CGGGTACGCA 
CCCAAAGGCG 
ATATGCTGAG 
CAACCCTGCT 
GCACTGTATT 
GGGGGCGAAG 
TGCGCAACGA 
GAGCAGCTTT 
GGAAGCCGCC 
GCGTGGTGGT 
GCGGAACAGA 
GCACGGTTGG 
TTGCCGCCAT 
TATGCCGCGC 
GCCCGAAGAC 
TTTTGATACA 
CTGGCACACG 
ACGGCTGGCG 
TCCTGACGCG 
GAAATGGTCG 
GGAAAATCAG 
AAGCCGGTCC 
GTGGCGGCGG 
AAATGCCGCC 
AATCGGAAAC 
GGCAAAGGGT 
AACGGACAAA 
TCATTGAAGA 
GGCGCGTGTG 
G 



TGCGCCGTCG 
TTCGCTGGCG 
TGCTGGTGTT 
GACAGGCGCG 
GATGTTTACG 
CCGCACAGTT 
CACGAGGCGC 
GGCGGCAGAC 
TCGGCGCGGC 
GCCGGCAGCG 
AATCGAAAAA 
AGGCGCGTTG 
AGCATAGGCG 
CAACGGGCGC 
TGGCAGAGGA 
TTGAGTTACA 
GATTGCCTCG 
TCGCCCGCCG 
GCGGTGGCGC 
CGAGTTCGGA 
CCATCGCCAA 
AGGCATTATC 
GTTTGACGAA 
TTTTGGGGAT 
CACGGCGTTT 
CGGCGCGGCG 
CGGACGATGC 
AACGGCAACG 
CGCGCAAAAA 
AAATCCGCAA 
TGGAAATTGG 
TTCGACCGAC 
AAGCATTCCG 
GATTTGAACG 
GTGCCGGTTT 
ATACGACCGC 
GAAGCGGCGG 
AGGGCAGGAC 
TCGGCAGGGA 
CCGGCGGGAA 
ACACGGCGGC 
TCAGAATCAT 



TCCTGTTGTA 
GATTATTTCT 
GTCCGCCGTT 
ACGGCGTATT 
CTGGTTGCCG 
CATCAACGGC 
TTGAACGCAG 
AACGCCCTCG 
TTCCCTGCCC 
GTTTTGCCCA 
AGCATCAAQC 
GGAAAAAATC 
GCGTATTGTA 
GATTACGCCT 
TGCCGTCTTA 
GCAAAAAAGG 
CTGCTGTCGA 
TTTCGTCGAA 
AAGGCGATTT 
CGCTTGACCA 
AGAAGCAGAC 
TTGAATGCGT 
CAAGGCTGTC 
GCCGCTTACC 
CGGCGCAGCA 
GCAGGTACGG 
CAAAATCCTG 
GCGTGGTAAT 
GAAGCCGCGT 
TCCGCTCACG 
GCGGGAAGCT 
ACCATCGTCA 
CAATTATGCG 
CCTTAATCGG 
GCGGCGGAGC 
CATGCGGCAG 
AAGAAGCCGA 
GGTCGGATTG 
AATGCTGCAC 
CGGGATTGGG 
CGCATCAGCC 
CTTGCCAAAA 



This corresponds to the amino acid sequence <SEQ ID 252; ORF64-l>: 



50 



55 



60 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 



MRRFLPIAAI CAWLLYGLT AATGSTSSLA 



LARYVILLL K 
TINSWFGNDT 
GDMGRVLEHY 
QRAGSVRDLE 
IEKARAKYAE 
PVLSLAEGAK 
ERNRRREEAA 
PLWGSSRHGW 
LGKATVLPED 
PIQLSAERLA 
RSPSLKLENQ 
VLHNIFKNAA 
NAFEPYVTDK 
TVKTYA* 



DRRDGVFGSQ 
HEALERS LNL 
AGSGFAQLAL 
SIGGVLYAQG 
LSYSKKGLQT 
AVAQGDFSQT 
RHYLECVLEG 
HGVSAQQSLL 
NGNGWMVID 
WKLGGKLDEQ 
DLNALIGDVL 
EAAEEADVPE 
PAGTGLGLPV 



IAKRLSGMFT 



DYFWWIVAFS 
LVAVLPGVFL 



A MLLLVLSAV 
FGVSAQFING 



SKSALNLAAD 
YNAASGKIEK 
WLSAGTHNGR 
FFLATLLIAS 



NALGNAVPVQ 
SINPHKLDQP 
DYALFFRQPV 
LLSIFLALVM 



RPVLRNDEFG 
LTTGWVFDE 
AEVFAAIGAA 
DITVLIHAQK 
DAQILTRSTD 
ALYEAGPCRF 
VRVKSETGQD 
VKKIIEEHGG 



RLTKLFNHMT 
QGCLKTFNKA 
AGTDKPVHVK 
EAAWGEVAKR 
TIVKQVAALK 
AAELAGEPLT 
GRIVLTVCDN 
RISLSNQDAG 



IDLIGAASLP 
FPGKARWEKI 
PKGVAEDAVL 
ALYFARRFVE 
EQLSIAKEAD 
AEQILGMPLT 
YAAPDDAKIL 
LAHEIRNPLT 
EMVEAFRNYA 
VAADTTAMRQ 
GKGFGREMLH 
GACVRIILPK 



Computer analysis of this amino acid sequence gave the following results: 
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Mnmnlo pv with a predicted HRF from N. meningitidis (strain A) 

ORF64 shows 92.6% identity over a 392aa overlap with an ORF (ORF64a) from strain A of K 



meningitidis: 

10 20 30 40 50 60 

orf64 .pe P i^^^ 

orf64a I^ FLPI^I^WLLYGLTAATGSTSSIA DYFVWIVAFS^LLLVLSAVI^YVILLLK 
—Jq jo 30 40 50 60 



orf64.pep 



orf 64a 



70 80 90 100 110 120 

n P Dnr.T^. g YVftyyPX XXMFTLVAXLPGVFLFG FPRQFINGTINSWFGNDTHEALERSLN 

till 1 1 1 1 1 ii nun iiiiini i iiiiiiiiimiiiimii 

orf64a ^o^v^^TAKB-T.S GMFTLVAVLPGVFLFGV SAQFINGTINSWFGNDTHEALERSLN 

70 80 90 100 110 

130 140 150 160 170 180 

orf64 Pep LSKSAI^UUVDNALGNAVPVQIDLIGAASLPGDMGRVI£HYAGSGFAQ^YimSGKIE 
orf64.pep i*» ... | mm, | II II I II I III II Ml > Ml llllll 

LSKSALNLAADNALGNAIPVQIDXIGAASLPXDMGRVLEHYAGSGFAQLALYNAASGKIE 
12 0 130 140 150 160 170 

190 200 210 220 230 240 

orf 64 . pep KSINPHKLDQPFPGKARWEKIQRAGSVRDIJSSIGGyLTO 

I INI I MM II II MM M I I: I Mill II I llllll II I II M "111111111 
orf64a KSINPH^DQPFPGKARWEKIQQAGSVRDXESIGGVLYAXGWLSAXTHNGRDYALFFRQP 
180 190 200 210 220 230 

250 260 270 280 290 300 

m I ll II I III I III I llllll ll llllll II MM III I III 1 1 1 llllll l II 

lit tC" irrn"T IEKARAXXXXI g y g ^T.rvrPFT.ATI.LIASLLSIFLALVMALYFARRFV 
240 250 260 270 280 290 

310 320 330 340 350 360 

orf 64 . pep EPVLSIAEGAKAVAQGDFSQTRPVLRNDEFGRLTXLFNIUWEQLSIAKDADERN 



orf 64. pep 
orf64a 



orf64a 



l I M I 1 1 1 1 I 1 1 I 1 1 I I I II I M I I 1 1 I I I I M I II I 1 1 M I M I M: II I I 1 1 I I II I 
EPVLSLAEGAKAVAQGDFSQTRPVUUJDEFGRLTKLE^HMTEQIiSIAKEADERNRRREEA 

300 310 320 330 340 350 



370 380 390 

or f 64 . pep ARHYLECVLEGLTTGVWFDEQGCLKTFNKAAGT 

orf64a ARHYLECVIiEGLOTGVWFDEQ^ 

360 370 380 390 400 410 

nrfG4a taevFAAIGAAAGTDKPVHVKYAAPDDAKILLGKATVLPEDNXNGWMVIDDITVLIHAQ 
420 430 440 450 460 470 

The complete length ORF64a nucleotide sequence <SEQ ID 253> is: 

1 ATGCGCCGTT ttctaccgat cgcagccata tgcgccgtcg tcctgttgta 
51 cggactgacg gcggcaaccg gcagcaccag ttcgctggcg gattatttct 
101 ggtggattgt tgcgttcagc gcaatgctgc tgctggtgtt gtccgccgtt 

151 TTGGCACGTT ATGTCATATT GCTGTTGAAA GACAGGCGCG ACGGCGTATT 

201 CGGTTCGCAG ATTGCCAAAC GCCTTTCCGG GATGTTTACG CTGGTTGCCG 

251 TACTGCCCGG CGTGTTTCTG TTCGGCGTTT CCGCACAGTT TATCAACGGC 

301 ACGATTAATT CGTGGTTCGG CAACGATACC CACGAGGCGC TTGAACGCAG 

351 CCTCAATTTG AGCAAGTCCG CATTGAATCT GGCGGCAGAC AACGCCCTTG 

401 GCAACGCCAT CCCCGTGCAG ATAGACNTCA TCGGCGCGGC TTCCCTGCCC 

451 NGGGATATGG GCAGGGTGCT GGAACATTAC GCCGGCAGCG GTTTTGCCCA 

501 GCTTGCCCTG TACAATGCCG CAAGCGGCAA AATCGAAAAA AGCATCAACC 

551 CGCACAAGCT CGATCAGCCG TTTCCAGGTA AGGCGCGTTG GGAAAAAATC 

601 CAACAGGCGG GTTCGGTCAG GGATNNGGAA AGCATAGGCG GCGTATTGTA 

651 CGCGCANGGC TGGCTGTCGG CAGNNACGCA CAACGGGCGC GATTACGCCT 

701 TGTTTTTCCG TCAGCCGGTT CCCAAAGGCG TGGCAGAGGA TGCCGTCTTA 

751 ATCGAAAAGG CAAGGGCGNA ANANNNTNAG TTGAGTTACA GCAAAAAAGG 

801 TTTGCAGACC TTTTTCCTNG CAACCCTGCT GATTGCCTCN CTGCTGTCGA 

851 TTTTTCTTGC ACTGGTCATG GCACTGTATT TCGCCCGCCG TTTCGTCGAA 
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901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 



CCCGTCCTAT 

CAGCCAGACG 

AGTTGTTCAA 

GAGCGCAACC 

GTTGGAGGGG 

TGAAAACCTT 

CCCCTGTGGG 

GTCCCTGCTT 

ACAAACCGGT 

CTGGGCAAGG 

GGTGATTGAC 

GGGGCGAAGT 

CCCATCCAGC 

GGACGAGCAN 

AACAAGTGGC 

CGTTCCCCTT 

CGATGTGTTG 

TTGCCGGCGA 

GTGCTGCACA 

TGTGCCCGAA 

TCCTGACAGT 

AATGCCTTCG 

ACTGCCCGTG 

TGAGCAATCA 

ACGGTAGAAA 



CGCTTGCCGA 

CGCCCCGTGT 

CCACATGACC 

GCCGGCGCGA 

CTGACCACGG 

CAACAAAGCG 

GCAGCAGCCG 

GCCGAAGTGT 

CCATGTGAAA 

CAACCGTCCT 

GACATCACCG 

GGCAAAACGG 

TTTCTGCCGA 

GACGCGCAAA 

GGCATTAAAA 

CGNCTCAATT 

GCATTGTACG 

ACCGCTGATG 

ATATTTTCAA 

GTCAGGGTAA 

TTGCGACAAC 

AGCCGTATGT 

GTGAAAAAAA 

GGATGCGGGC 

CTTATGCGTA 



GGGGGCGAAG 
TGCGCAACGA 
GAGCAGCTTT 
GGAAGCCGCC 
GCGTGGTGGT 
GCGGAACAGA 
GCACGGTTGG 
TTGCCGCCAT 
TATGCCGCGC 
GCCCGAAGAC 
TTTTGATACA 
CTGGCACACG 
ACGGCTGGCG 
TCCTGACACG 
GAAATGGTCG 
GGAAAATCAG 
AAGCTGGTCC 
ATGGCGGCGG 
AAATGCCGCC 
AATCGGAAGC 
GGCAAGGGGT 
AACGGACAAA 
TCATTGAAGA 
GGCGCGTNTG 
G 



GCGGTGGCGC 
CGAGTTCGGA 
CCATCGCCAA 
AGACATTATC 
GTTTGACGAA 
TTTTGGGGAT 
CACGGCGTTT 
CGGCGCGGCG 
CGGACGATGC 
AACNGCAACG 
CGCGCAAAAA 
AAATCCGCAA 
TGGAAATTGG 
TTCGACCGAC 
AGGCATTCCG 
GATTTGAACG 
GTGCCGGTTT 
ATACGACCGC 
GAAGCGGCGG 
GGGGCAGGAC 
TCGGCAGGGA 
CCGGCTGGAA 
ACACGGCGGC 
TCAGAATCAT 



AAGGCGATTT 
CGCTTGACCA 
AGAAGCAGAC 
TCGAATGCGT 
CAAGGCTGTC 
GCCGCTTACC 
CGGCGCAGCA 
GCAGGTACGG 
CAAAATCCTG 
GCGTGGTAAT 
GAAGCCGCGT 
TCCGCTCACG 
GCGGGAAGCT 
ACCATCATCA 
CAATTACNCG 
CCTTAATCGG 
GCGGCGGAAC 
CATGCGGCAG 
AAGAAGCCGA 
GGACGGATTG 
AATGCTGCAC 
CGGGATTGNG 
CNCATCAGCC 
CTTGCCAAAA 



This encodes a protein having amino acid sequence <SEQ ID 254>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 



MRRFLPIAAI CAWLLYGLT AATGSTSSLA 



LARYVILLL K 
TINSWFGNDT 
XDMGRVLEHY 
QQAGSVRDXE 
IEKARAXXXX 
PVLSLAEGAK 
ERNRRREEAA 
PLWGSSRHGW 
LGKATVLPED 
PIQLSAERLA 
RSPSXQLENQ 
VLHNIFKNAA 
NAFEPYVTDK 
TVETYA* 



DRRDGVFGSQ 
HEALERSLNL 
AGSGFAQLAL 
SIGGVLYAXG 
LSYSKKGLQT 
AVAQGDFSQT 
RHYLECVLEG 
HGVSAQQSLL 
NXNGWMVID 
WKLGGKLDEX 
DLNALIGDVL 
EAAEEADVPE 
PAGTGLXLPV 



IAKRLSGMFT 



DYFWWIVAFS 
LVAVLPGVFL 



SKSALNLAAD 
YNAASGKIEK 
WLSAXTHNGR 
FFLATLLIAS 



NALGNAIPVQ 
SINPHKLDQP 
DYALFFRQPV 
LLS I FLALVM 



RPVLRNDEFG 
LTTGVWFDE 
AEVFAAIGAA 
DITVLIHAQK 
DAQILTRSTD 
ALYEAGPCRF 
VRVKSEAGQD 
VKKIIEEHGG 



RLTKLFNHMT 
QGCLKTFNKA 
AGTDKPVHVK 
EAAWGEVAKR 
TIIKQVAALK 
AAELAGEPLM 
GRIVLTVCDN 
XISLSNQDAG 



AM LLLVLSAV 
FGVSAQFING 
IDXIGAASLP 
FPGKARWEKI 
PKGVAEDAVL 
ALY FARRFVE 
EQLSIAKEAD 
AEQILGMPLT 
YAAPDDAKIL 
LAHEIRNPLT 
EMVEAFRNYX 
MAADTTAMRQ 
GKGFGREMLH 
GAXVRIILPK 



ORF64a and ORF64-1 show 96.6% identity in 706 aa overlap: 



10 20 30 40 50 60 

orf 64a . pep MRRFLPIAAICAWLLYGLTAATGSTSSLADYFWWIVAFSAMLLLVLSAVLARYVILLLK 
j | I | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I i I I I I I I 1 I II I I I II I I II 
orf 64-1 MRRFLPIAAICAWLLYGLTAATGSTSSLADYFWWIVAFSAMLLLVLSAVLARYVILLLK 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 64a. pep DRRDGVFGSQIAKRLSGMFTLVAVLPGVFLFGVSAQFINGTINSWFGNDTHEALERSLNL 
|| | | || | || I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 
orf 64-1 DRRDGVFGSQIAKRLSGMFTLVAVLPGVFLFGVSAQFINGTINSWFGNDTHEALERSLNL 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 64a . pep SKS ALNLAADN ALGNAI PVQI DXI GAASL PXDMGRVLEHYAGSGFAQLALYNAASGKI EK 
I I I I I I M I 1 I II I I I: i I I I I I I I II I I I I I I I I I I I I I II II II I I I I I I I I I I i I 
orf 64-1 SKSALNLAADNALGNAVPVQIDLIGAASLPGDMGRVLEHYAGSGFAQLALYNAASGKIEK 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 64a . pep S INPHKLDQPFPGKARWEKIQQAGSVRDXES IGGVLYAXGWLSAXTHNGRDYALFFRQPV 
I I I I I I I I I I I I I I I I I I I I I : I I t I I I MINIMI Mill I I I I I I I I 1 I I I f I I 
orf 64-1 S IN PHKLDQPFPGKARWEKIQRAGSVRDLES IGGVLYAQGWLSAGTHNGRDYALFFRQPV 

190 200 210 220 230 240 



250 260 270 280 290 300 
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-184- 



orf64a Deo PKGVAEDAVLIEKARAXXXXLS YSKKGLQT FFLATLLI ASLLSI FLALVMALYFARRFVE 
' P F 1 1 | I | I | 1 | | | I I t 1 I I 1 I I I I t I I I I I I I 1 I 1 I I 1 1 I t I I I t I I 1 I 1 I 1 I 1 t 1 1 I 
orf 64-1 PKGVAEDAVLIEKARAKYAELSYSKKGLQTFFLATLLIASLLSI FLALVMALYFARRFVE 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf64a oeD PVLSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTKLFNHMTEQIjSIAKEADERNRRREEAA 
1 | | | | | | 1 I 1 I I I 1 I I I I 1 1 1 I I 1 I I I t 1 I I I I I I I I 1 I I 1 1 1 i I I I 1 1 I I 1 I I I 1 t I I I 
orf 64-1 PVLSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTICLFNHMTEQLSIAKEADERNRRREEAA 

310 320 330 340 350 360 

370 380 390 400 410 420 

or f 6 4 a pep RHYLECVLEGLTTGWVFDEQGCLKTFNKAAEQILGMPLTPLWGS SRHGWHGVSAQQSLL 

K 1 1 1 i t K 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I I 1 1 k I 1 1 1 1 1 1 

orf 64-1 RHYLECVLEGLTTGVWFDEQGCLKTFNKAAEQILGMPLTPLWGSSRHGWHGVSAQQSLL 
'370 380 390 400 410 420 

430 440 450 460 470 480 

orf64a pep AEVFAAI GAAAGTDKPVHVKYAAP DDAKI LLGKAT VLPE DNXNGWMVI DDIT VLIHAQK 
| 1 1 1 1 I 1 1 I 1 I I 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 I 1 I 1 1 1 1 1 1 1 I I 1 I 1 1 I I t t I I I ! I I 1 1 1 1 I 
orf 64-1 AEVFAAIGAAAGTDKPVHVKYAAPDDAKILLGKATVLPEDNGNGWMVIDDITVLIHAQK 
430 440 450 460 470 480 

490 500 510 520 530 540 

or f 64 a . pep EAAWGEVAKRLAHE I RN PLT PIQLSAERLAWKLGGKLDEXDAQI LTRST DT I IKQVAALK 
i | | | | | | M I I I i I I I II I I I I I I I I 1 I I I I I I I II I I I I I I I I I I I I M I s M 1 1 I I I 
orf 64-1 EAAWGEVAKRLAHE I RN PLTP I QLSAERLAWKLGGKLDEQDAQI LTRSTDT IVKQVAALK 

490 500 510 520 530 540 

550 560 570 580 590 600 

orf64a pep EMVEAETINYXRSPSXQLENQDLNALIGDVIJVLYEAGPCRFAAELAGEPLMMAADTTAMRQ 
M 1 I I 1 1 1 i I I I I r | i I I 1 I 1 I 1 1 I I I 1 I I I I 1 I t I I I 1 t I 1 I t I I I :MMIIIII 
o r f 6 4 - 1 EMVEAFRNYARSPSLKLENQDLNAL IGDVLALYEAGPCRFAAELAGE PLT VAADTTAMRQ 

550 560 570 580 590 600 

610 620 630 640 650 660 

orf 64a . pep VLHN I FKNAAE AAEEADVPE VRVKSEAGQDGRI VLTVCDNGKGFGREMLHNAFE PYVT DK 
I I I I I I I I I ! I 1 1 I I I I I I I I I I I I I r 1 | | | 1 I I I I I I I 1 I I I I t I I I 1 I I I 1 I I I 1 I I 1 
orf 64-1 VLHNIFKNAAEAAEEADVPEVRVKSETGQDGRIVLTVCDNGKGFGREMLHNAFEPYVTDK 

610 620 630 640 650 660 

670 680 690 700 

orf 64a . pep PAGTGLXLPWKKI IEEHGGXI SLSNQDAGGAXVRI ILPKTVETYAX 
IIMM i I I 1 I I 1 I I I I I 1 I 1 I I I I i 1 t I 1 IIIIIMICIIM 
orf 64-1 PAGTGLGLPWKKI IEEHGGRI SLSNQDAGGACVRI ILPKTVKT YAX 

670 680 690 700 

Homology with a predicted ORF from N . gonorrhoeae 

ORF64 shows 86.6% identity over a 387aa overlap with a predicted ORF (ORF64.ng) from N. 



gonorrhoeae: 

orf 64. pep 
orf64ng 
orf 64 .pep 
orf 64ng 
orf 64. pep 
orf 64ng 
orf 6 4. pep 
orf64ng 



MRRFLPIAAICAXXLXXGLTAATGSTSSLADYFWWIVAFSAMLLLVLSAVLARYVILLLK 60 
| | | 1 | | 1 | | || I I 11 1 I I I I I I I I I I I I I I I M : I 1 I I I I I I I I 1 I I I I II I I I I I 

MRRFLPI AAICAWLLYGLTAATGSTSSIAD YFWW I VS FSAMLLLVLSAVLARYVI LLLK 60 

DRRDGVFGSXXAKXPXXXMFTLVAXLPGVFLFGFPAQFINGTINSWFGNDTHEALERSLN 120 

1)1:11111 11 IIMM IICIIII: II I I I I I 1 I I 1 I 1 M I I I i 1 I I I I I 

DRRNGVFGSQIAKR-LSGMFTLVAVLPGLFLFGISAQFINGTINSWFGNDTHEALERSLN 119 

LSKSALNLAADNALGNAVPVQI DLI GAASLPGDMGRVLEHYAGSGFAQLALYNXASGKIE 180 

t | I I I 1 1 I 1 I I I I - r | { | 1 | I t 1 I I I : I I I l:M II 11 i I I I M M I I M I IMMI 

LSKSALDLAADNAVSNAVPVQIDLIGTASLSGNMGSVLEHYAGSGFAQLALYNAASGKIE 179 

KS INPHKLDQP FPGKARWEKIQRAGS VRDLES IGGVLYAQGWLS AGTHXGRDYALFFRQP 240 

|||il|::|H:l I : II : 1 1 : : I I 1 h I I I I i II I 1 I M I I I I I II IMMMMM 

KSINPHQFDQPLPDKEHWEQIQQTGSVRSLESIGGVLYAQGWLSAGTHNGRDYALFFRQP 239 
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orf 64. pep 
orf 64ng 
orf64.pep 
orf64ng 
orf 64. pep 
orf64ng 



VPKGVAEDAVLIEKARAKYAELSYSKKGLQTFFLATLLIASLLSI FLALVMALY FARRFV 300 

: I:: II :| I I I I IN II i I M I I MM I I I I Ml:l I I I M I I I I I I I I I I II 1 t I I I I i 

I PENVAQDAVLI EKARAKYAELSYSKKGLQT FFLVTLLIASLLS I FLALVMALYFARRFV 299 

EPVLSLAEGAI^VVAQGDFSQTRPVIJIKDEFGRLTXLFNHMTEQLSIAKDADERNRRREEA 360 

I I r I I I I { t I I I I I I I I I I I I I I I I I I 1 I 1 t 1 I I I I I I I I I I M I I S : I I I I I K I fl I I I 

EPI LSLAEGAKAVAQGDFSQTRPVLRN DE FGRLTKLFNHMTEQLS I AKEADERNRRREEA 359 

ARHYLECVLEGLTTGVWFDEQGCLKT FNKAAGT 394 
I I I I I I 1 II : I I I I I 1 I I :l :! 

ARHYLECVLDGLTTGWVSYPLSCCRTAVFSTCHSSPLSYF 4 00 



An ORF64ng nucleotide sequence <SEQ ID 255> was predicted to encode a protein having amino 



acid sequence <SEQ ID 256>: 



1 MRRFLPIAAI CAWLLYGLT AATGSTSSLA 

51 LARYVILLLK DRRNGVFGSQ IAKRLS GMFT 

101 TINSWFGNDT HEALERS LNL 

151 GNMGSVLEHY AGSGFAQLAL 

201 QQTGSVRSLE SIGGVLYAQG 

251 IEKARAKYAE LSYSKKGLQT __ 

301 PILSLAEGAK AVAQGDFSQT RPVLRNDEFG 

351 ERNRRREEAA RHYLECVLDG LTTGVWSYP 



DYFWWIVSFS 
LVAVLPGLFL 



SKSALDLAAD 
YNAASGKIEK 
WLSAGTHNGR 
FFLVTLLIAS 



NAVSNAVPVQ 
SINPHQFDQP 
DYALFFRQPI 
LLSIFLALVM 



RLTKLFNHMT 
LSCCRTAVFS 



A MLLLVLSAV 
FGI SAQFING 
IDLIGTASLS 
LPDKEHWEQI 
PENVAQDAVL 
ALYFARRFVE 
EQLSIAKEAD 
TCHSSPLSYF* 



Further work revealed the complete gonococcal DNA sequence <SEQ ID 257>: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
.1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 



ATGCGCCGCT 

CGGATTGACG 

GGTGGATAGT 

TTGGCACGTT 

CGGTTCGCAG 

TACTGCCCGG 

ACGATTAATT 

CCTTAATTTG 

GCAACGCCGT 

GGCAATATGG 

GCTTGCCCTG 

CGCACCAATT 

CAGCAGACCG 

CGCGCAGGGA 

TGTTCTTCCG 

ATTGAAAAGG 

TTTGCAGACC 

TTTTTCTTGC 

CCCATTCTGT 

CAGCCAGACG 

AGCTGTTCAA 

GAACGCAACC 

GTTGGATGGG 

TGAAAACCTT 

CCCCTGTGGG 

GTCCCTGCTT 

ACAAACCGGT 

CTGGGCAAGG 

GGTGATTGAC 

GGGGTGAAGT 

CCCATCCAGC 

GGACGATCAG 

AACAGgtggc 

CGCGCCCCTT 

CGATGTTTTG 

TTGCCGGCGA 

GTGCTGCACA 

TATGCCCGAA 

TCCTGACGGT 

AATGCTTTCG 

TCTGCCTGTA 

TGAGCAATCA 

ACGGTAGAAA 



TCCTACCGAT 

GCGGCGACCG 

CTCGTTCAGC 

ATGTCATATT 

ATTGCCAAAC 

CTTGTTCCTG 

CGTGGTTCGG 

AGCAAGTCCG 

TCCCGTACAG 

GCAGTGTGCT 

TACAATGCCG 

CGACCAGCCG 

GTTCGGTTCG 

TGGTTGTCGG 

CCAGCCGATT 

CGCGGGCGAA 

TTTTTTCTGG 

GCTGGTAATG 

CGCTTGCCGA 

CGCCCCGTAT 

CCATATGACC 

GCCGGCGCGA 

TTGACTACCG 

CAACAAGGCG 

GCAGCAGCCG 

GCCGAAGTGT 

CCAGGTGGAA 

CGACGGTATT 

GACATCACCG 

GGCGAAGOGG 

TTTCCGCCGA 

GACGCGCAAA 

gGCGTTAAAA 

CGCTCAAACT 

GCCCTGTACG 

ACCGCTGATG 

ATATTTTCAA 

GTCAGGGTAA 

TTGCGACAAC 

AGCCGTATGT 

GTGAAAAAAA 

GGATGCGGGT 

CTTATGCGTA 



CGCAGCCATA 

GCAGCACCAG 

GCAATGCTGC 

GCTGTTGAAA 

GCCTTTCCGG 

TTCGGCATTT 

CAACGACACC 

CACTGGATTT 

ATAGACCTCA 

GGAACACTAC 

CAAGCGGGAA 

CTTCCCGACA 

GAGTTTGGAA 

CAGGTACGCA 

CCCGAAAATG 

ATATGCCGAA 

TAACCCTGCT 

GCACTGTATT 

GGGCGCAAAG 

TGCGCAACGA 

GAGCAGCTTT 

GGAAGCCGCC 

GTGTGGTGGT 

GCGGAACAGA 

GCACGGTTGG 

TtgccgccAT 

TATGCCGCGC 

GCCCGAAGAC 

TGCTGATACG 

CTGGCACACG 

ACGGCTGGCG 

TCCTGACGCG 

GAAATGGTCG 

GG AAAATCAG 

AAGCCGGCCC 

ATGGCGGCGG 

AAATGCCGCC 

AATCGGAAAC 

GGCAAGGGAT 

GACGGATAAG 

TCATTGGAGA 

GGGGCGTGTG 

G 



TGCGCCGTCG 

TTCGCTGGCG 

TGCTGGTGTT 

GACAGGCGCA 

GATGTTCACG 

CCGCGCAGTT 

CACGAAGCCC 

GGCGGCAGAC 

TCGGCACCGC 

GCCGGCAGCG 

AATCGAAAAA 

AAGAACATTG 

AGCATAGGCG 

CAACGGGCGC 

TGGCACAGGA 

TTGAGTTACA 

GATTGCCTCG 

TTGCCCGCCG 

GCGGTGGCGC 

CGAGTTCGGA 

CCATCGCCAA 

CGTCACTACC 

GTTTGACGAA 

TTTTGGGGAT 

CACGGCGTTT 

CGGTGCGGCG 

CGGACGATGC 

AACGGCAACG 

CGCGCAAAAA 

AAATCCGCAA 

TGGAAATTGG 

TtcgACCGAC 

AGGCATTCCG 

GATTTGAACG 

GTGCCGGTTT 

ATACGACCGC 

GAAGCGGCGG 

GGGGCAGGAC 

TCGGCAAGGA 

CCGGCGGGAA 

ACACGGCGGC 

TCAGAATCAT 



TCCTGCTGTA 

GATTATTTCT 

GTCCGCCGTT 

ACGGCGTGTT 

CTGGTCGCCG 

TATCAACGGC 

TCGAACGCAG 

AATGCCGTCA 

CTCCCTGTCG 

GTTTTGCCCA 

AGCATCAATC 

GGAACAGATT 

GCGTATTGTA 

GATTACGCGC 

TGCCGTTCTG 

GCAAAAAAGG 

CTGCTGTCGA 

TTTCGTCGAA 

AGGGTGATTT 

CGTTTGACCA 

AGAAGCAGAC 

TCGAGTGCGT 

AAAGGCCGTT 

GCCGCTCGCC 

CGGCGCAGCA 

GCAGGTACGG 

CAAAATCCTG 

GCGTGGTGAT 

GAAGCCGCGT 

TCCGCTCACG 

GCGGGAAGCT 

AC CAT CAT CA 

CAATTACGCG 

CCTTAATCGG 

GAGGCGGAAC 

CATGCGGCAG 

AAGAAGCCGA 

GGACGGATTG 

AATGCTGCAC 

CGGGACTGGG 

CGCATCAGCC 

CTTGCCAAAA 
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This corresponds to the amino acid sequence <SEQ ID 258; ORF64ng-l>: 



10 



15 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 



MttRFLPIAAI CAWLLYGLT AATGSTSSLA 



LARYVILLL K 

TINSWFGNDT 

GNMGSVLEHY 

QQTGSVRSLE 

IEKARAKYAE 

PILSLAEGAK 

ERNRRREEAA 

PLWGSSRHGW 

LGKATVLPED 

PIQLSAERLA 

RAPSLKLENQ 

VLHNIFKNAA 

NAFEPYVTDK 

TVETYA* 



DRRNGVFGSQ 
HEALERSLNL 
AGSGFAQLAL 
SIGGVLYAQG 
LSYSKKGLQT 
AVAQGDFSQT 
RHYLECVLDG 
HGVSAQQSLL 
NGNGWMVID 
WKLGGKLDDQ 
DLNALIGDVL 
EAAEEADMPE 
PAGTGLGLPV 



IAKRLSGMFT 



DYFWWIVSFS 
LVAVLPGLFL 



SKSALDLAAD 
YNAASGKIEK 
WLSAGTHNGR 
FFLVTLLIAS 



NAVSNAVPVQ 
SINPHQFDQP 
DYALFFRQPI 
LLS I FLALVM 



RPVLRNDEFG 
LTTGVWFDE 
AEVFAAIGAA 
DITVLIRAQK 
DAQILTRSTD 
ALYEAGPCRF 
VRVKSETGQD 
VKKIIGEHGG 



RLTKLFNHMT 
KGRLKTFNKA 
AGTDKPVQVE 
EAAWGEVAKR 
TIIKQVAALK 
EAELAGEPLM 
GRIVLTVCDN 
RISLSNQDAG 



A MLLLVLSAV 
FGI SAQFING 
IDLIGTASLS 
LPDKEHWEQI 
PENVAQDAVL 
ALYFARRFVE 
EQLSIAKEAD 
AEQILGMPLA 
YAAPDDAKIL 
LAHEIRNPLT 
EMVEAFRNYA 
MAADTTAMRQ 
GKGFGKEMLH 
GACVRIILPK 



ORF64ng-l and ORF64-1 show 93.8% identity in 706 aa overlap: 



20 



25 



30 



35 



40 



45 



50 



55 



60 



65 



orf64ng-l.pep 
orf64-l 



orf64ng-l.pep 
orf64-l 



orf64ng-l.pep 
orf64-l 



orf64ng-l.pep 
orf64-l 



orf64ng-l.pep 
orf64-l 



orf64ng-l.pep 
orf64-l 



orf64ng-l.pep 
orf64-l 



orf 64ng-l.pep 
orf64-l 



10 20 30 40 50 60 

MRRFLPIAAICAWLLYGLTAATGSTSSLADYFWWIVSFSAMLLLVLSAVLARYVILLLK 

| 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 K 1 1 I I 1 1 ! 1 1 1 1 1 1 1 1 1 1 * I M ir 
M^kpiAAlCAWLLYGLTAATGSTSSI^ 

20 30 40 50 60 



10 



110 



120 



70 80 90 100 

DRRNGVFGSQIAKRLSGMFTLVAVLPGLFLFGISAQFINGTINSWFGNDTHEALERSLNL 

Ml* I 1 1 1 II I M II I I M N 1 1 1 1 1 1 1 1 1 1 : 1 1 1 1 1 1 1 1 M 1 1 1 1 M II I 1 1 1 1 1 M I 
D^DGVFGSQIAKRLSGMFTLVAVLPGVFLFGVSAQFINGTINSWFGNDTHEALERSLNL 

70 80 90 100 110 120 



100 



110 



170 



180 



130 140 150 160 

SKSALDLAADNAVSNAVPVQIDLIGTASLSGNMGSVLEHYAGSGFAQLAL YNAASGKIEK 

III 1 1 r 1 1 1 1 1 1 r = 1 1 1 1 1 1 1 1 1 1 1 : 1 1 1 MM H U M I II M 1 1 I I Ml 1 1 1 1 1 1 1 

SKSA^IAADNALGNAVPVQIDLIGAASLPGDMGRVLEHYAGSGFAQLAL YNAASGKIEK 
130 140 150 160 170 180 



230 



240 



190 200 210 220 

SINPHOFDQPLPDKEHWEQIQQTGSVRSLESIGGVLYAQGWLSAGTHNGRDYALFFRQPI 

Mill:: ||: I I : 1 1 : M : : I M I : M M I M I M M M 1 1 M M M I I M I M M : 
SINPHKLDQPFPGKARWEKIQRAGSVRDLESIGGVLYAQGWLS 

fc 200 210 220 230 240 



190 



250 260 270 280 290 300 

PENVAQDAVLIEKARAKYAELSYSKKGLQTFFLVTLLIASLLSIFLALVMALYF^RHVE 
I • • 1 | : || M II II II I M I M I 1 1 M II II II : II 1 1 1 1 1 1 II I II M II II M I II 1 1 
PK^AEDAVLIEKARAKYAELS YSKKGLQTFFLATLLI ASLLS I FLALVMALYFARRFVE 

250 260 270 280 290 300 



350 



360 



310 320 330 340 

PILSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTKLFNHMTEQLSIAKEADERNRRREEAA 

I : || I 1 II II M I 1 1 1 II I I II 1 ! I I i I II 1 1 I M 1 1 1 1 I 1 1 I M I 1 1 1 1 I 1 1 II II 1 1 I 
PVLS LAEGAKAVAQGD FSQTRPVLRNDE FGRLTKLFNHMTEQLS I AKEADERNRRREE AA 
320 330 340 350 360 



310 



420 



370 380 390 400 410 

RHYLECVLDGLTTGWVFDEKGRLKTFNKAAEQILGMPLAPLWGSSRHGWHGVSAQQSLL 

Mill lll:IM I III II 1 1:1 I III M I MMIl IN: III 111 II M Mil III IN 
MYLECVLEGLTTGVWFDEQGCLKTFNKAAEQILGMPLTPLWGSSRHGWHGVSAQQSLL 

370 380 390 400 410 420 

430 440 450 460 470 480 

AEVFAAIGAAAGTDKPVQVEYAAPDDAKI LLGKATVLPEDNGNGWMVI DDITVLIRAQK 
YTlltMIIIMIMIIMMiMillMllMIIIIIIMIIIIMIlllMllhIM 
AEVFAAIGAAAGT DKPVHVKYAAPDDAKI LLGKATVLPEDNGNGWMVI DDITVLIHAQK 

430 440 450 460 470 480 



490 500 510 520 530 540 
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orf64-l EAAWGEVAKRLAHEIRNPLTPIQLSAERLAWKLGGKLDEQDAQILTRSTDTIVKQVAALK 

490 500 510 520 530 540 



10 



15 



20 



orf64ng-l.pep 
orf64-l 



orf 64ng-l.pep 
orf64-l 



orf64ng-l.pep 
orf64-l 



550 560 570 580 590 600 

EMVEAFRNYARAPSLKLENQDLNALIGDVIJUjYEAGPCRFEAEIAGEPLMMAADTTAMRQ 
I I I I I I I I I I I : I I I I I I I I I I I I I 1 I I I It I I i 1 I I I I I llllllli = I f I I E I I I I 
EMVEAFRNYAR S P S LKLENQDLNAL I G D VLALYEAG PCRFAAELAGE PLT VAADTTAMRQ 

550 560 570 580 590 600 

610 620 630 640 650 660 

VLHNIFKNAAEAAEEADMPEVRVKSETGQDGRIVLTVCDNGKGFGKEMLHNAFEPYVTDK 
I 1 1 I i I I I I I I I i 1 1 M: I II I I I I I I I I I I I I I I M II i I I i I !:]{ I I I I I I I I I I 1 I 
VLHNIFKNAAEAAEEADVPEVRVKSETGQDGRIVLTVCDNGKGFGREMLHNAFEPYVTDK 

610 620 630 640 650 660 

670 680 690 700 

PAGTGLGLPWKKIIGEHGGRISLSNQDAGGACVRIILPKTVETYAX 
I I M I I I M I I I I I I I III I II I III II II I II Mill 111:111 I 
PAGTGLGLPWKKI IEEHGGRI SLSNQDAGGACVRI ILPKTVKT YAX 
670 680 690 700 



25 



Furthermore, ORF64ng-l shows significant homology to a protein fromA.caulinodans: 

sp|Q04850|NTRY_AZOCA NITROGEN REGULATION PROTEIN NTRY >gi | 77479 |pir | I S18624 ntrY 
protein - Azorhizobium caulinodans >gi 138737 (X63841) NtrY gene product 
[Azorhizobium caulinodans] Length = 771 i 
Score = 218 bits (550), Expect =• 7e-56 

Identities - 195/720 (27%) , Positives - 320/720 (44%), Gaps = 58/720 (8%.) 

IAAI CAWLLYGLTAATG ST S S LADY FWW IXXXXXXXXXXXXXXXXRYVI LLLKDRRNGV 66 
I+A+ ++L GLT + + + R + + K R G j 

ISALATFLILMGLTPWPTHQWIS VLLVNAAAVLILSAMVGREIWRIAKARARGR 90 j 

FGSQIAKRLSGMFTLVAVLPGLFLFGISAQFINGTINSWFGNDTHEALERSLNLSKSALD 126 

+++ R+ G+F +V+V+P + + -H-+ ++ ++ WF T E + S++++++ + 
AAARLHIRIVGLFAWSWPAILVAWASLTLDRGLDRWFSMRTQEIVASSVSVAQTYVR 150 





Query: 


7 


30 


Sbjct: 


35 




Query: 


67 


35 


ok-i . 
OD] Cu > 


Q1 




Query: 


127 




Sbjct: 


151 


40 


Query: 


185 




Sbjct: 


201 


45 


Query: 


234 




Sbjct: 


257 




Query: 


292 


50 


Sbjct: 


317 




Query: 


351 


55 


Sbjct: 


377 




Query: 


411 




Sbjct: 


435 


60 


Query: 


468 




Sbjct: 


489 


65 


Query: 


528 




Sbjct: 


548 




Query: 


588 


70 


Sbjct: 


608 



A N 



+ + + DL 



S+ 



Y G S F Q+ AA + ++ 
-YEGDRSRFNQILTAQAALRNLPGAMLI 



200 



+ D + ++ + 



L+ + I 



I + V + +IG 



N DY 



++ A Y L + G+Q F + 



L F++ V PI L A VA+G+ 



P+R++ L+FNMT+L 



+ E VL G+ GV+ D + R+ N++AE++LG L+ + 



RH 



V LL E + VQ D + + V E + +G V+ 

2WPETAGLLEEA EHARQRS VQGN I TLTRDGRERV FAVRVTTEQS PEAEHGWW 488 

/IDDITVLIRAQKEAAWGEVAKRLAHEIRNPLTPIQLSAERLAWKLGGKLDDQDAQILTR 527 
+DDIT LI AQ+ +AW + VA+R+ AHE I +N PLT P I QLS AERL K G + QD +1 + 



TDTII+QV + MV+ F ++AR P +++QD++ +1 + L 



PMA D +QLNIKN 



P+VR + 



- SETGQDGR I VLT VCD 639 
+ G+D +V+ + D 
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Query: 640 NGKGFGKEMLHNAFEPYVTDKPAGTGLGLPVVKKIIGEH^RISLSNQDAG-GACTOIIL 698 

ttSS +E + EPYVT + GTGLGL +V KI+ EHGG I L++ G GA +R+ L 
Sbjct: 665 NGTGLPQESRNRLIiEPYVTTREKGTGLGLAIVGKIMEEHGGGIELNDAPEGRGAWIRLTL 724 

Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N.meningitidis and N.gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 31 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 259>: 

1 ATGTACGCAT TTACCGCCGC ACAGCAACAG AAGGCACTCT TCCGGCTGGT 

51 GCTTTTTCAT ATCCTCATCA TCGCCGCCAG CAACTATCTG GTGCAGTTCC 

101 CTTTCCAAAT TTTCGGCATC CACACCACTT GGGGCGCATT TTCCTTTCCC 

151 TTCATCTTCC TTGCCACCGA CCTGACCGTC CGCATTTTCG GTTCTCACTT 

201 GGCACGGCGG ATTATCTTTT GGGTGATGTT CCCCGCCCTT TTGCTTTCCT 

251 ACGTCTTTTC CGTTTTGTTC CACAACGGCA GTTGGACAGG CTTGGGCGCG 

301 CTGTCCGAAT TCAACACCTT TGTCGGACGC ATCGCCTTAG CCAGCTTTGC 

351 CGCCTACGCG ATCGGACAAA TCCTTGATAT TTTTGTATTC AACAAATTAC 

401 GCCGTCTGAA AGCGTGGTGG ATTGCACCGA ACGCATCAAC CGTCATCGGG 

451 CACGCGTTGG ATACG. . . 

This corresponds to the amino acid sequence <SEQ ID 260; ORF66>: 

1 MYAFTAAQQQ KALFRLVLFH ILIIAASNYL VQFPFQIFGI HTTWGAFSFP 
51 FIFLATDLTV RIFGSHLARR IIFWVMFPAL LLSYVFSVLF HNGSWTGLGA 
101 LSEFNTFVGR IALASFAAYA IGQILDIFVF NKLRRLKAWW IAPNASTVIG 
151 HALDT... 

Further work revealed the complete nucleotide sequence <SEQ ED 261>: 

1 ATGTACGCAT TTACCGCCGC ACAGCAACAG AAGGCACTCT TCCGGCTGGT 

51 GCTTTTTCAT ATCCTCATCA TCGCCGCCAG CAACTATCTG GTGCAGTTCC 

101 CTTTCCAAAT TTTCGGCATC CACACCACTT GGGGCGCATT TTCCTTTCCC 

151 TTCATCTTCC TTGCCACCGA CCTGACCGTC CGCATTTTCG GTTCTCACTT 

201 GGCACGGCGG ATTATCTTTT GGGTGATGTT CCCCGCCCTT TTGCTTTCCT 

jm ACGTCTTTTC CGTTTTGTTC CACAACGGCA GTTGGACAGG CTTGGGCGCG 

301 CTGTCCGAAT TCAACACCTT TGTCGGACGC ATCGCCTTAG CCAGCTTTGC 

351 CGCCTACGCG ATCGGACAAA TCCTTGATAT TTTTGTATTC AACAAATTAC 

401 GCCGTCTGAA AGCGTGGTGG ATTGCACCGA CCGCATCAAC CGTCATCGGC 

451 AACGCCTTGG ATACGCTGGT ATTTTTCGCC GTTGCCTTCT ACGCAAGCAG 

501 CGATGGATTT ATGGCGGCAA ACTGGCAGGG CATCGCTTTT GTCGATTACC 

551 TGTTCAAACT TACCGTCTGC ACCCTCTTCT TCCTGCCCGC CTACGGCGTG 

601 ATACTGAATC TGCTGACGAA AAAACTGACA ACCCTGCAAA CCAAACAGGC 

651 GCAAGACCGC CCCGCGCCCT CGCTGCAAAA TCCGTAA 

This corresponds to the amino acid sequence <SEQ ID 262; ORF66-l>: 

1 Mv^aaom KALFRLVLFH ILIIAASNYL VQFP FQIFGI HTTWGAFSFP 

5 T ft-"™?'™ ptfp.SHT.RRR IIF WVMFPAL LLSYVFSV LF HNGSWTGLGA 

101 LSEFNTFVGR I ALASFAAYA IGQILDIFV F NKLRRLKAWW IAPTASTVIG 

151 NALDTLVFFA VAFY ASSDGF MAANWQGIAF VDYLFKLTVC TLFFLPAYGV 

201 ilnllt kklt"tlqtkqaqdr PAPSLQNP* 
Computer analysis of this amino acid sequence gave the following results: 

u ^oy with the hypotheti c al nrotein o221 of K coli (accession number P37619) 

) ORF66 and o221 protein show 67% aa identity in 155aa overlap: 
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orf 66 


1 


o221 


1 


orf 66 


61 


o221 


61 


orf66 


121 


o221 


121 



MYAFTAAQQQKALFRLVLFHILIIAASNYLVQFPFQIFGIHTTWGAFSFPFIFLATDLTV 60 
M F+ Q+ KALF L LFH+L+I +SNYLVQ P I G HTTWGAFSFPFI FLATDLTV 
MNVFSQTQRYKALFWLSLFHLLVITSSNYLVQLPVSILGFHTTWGAFSFPFIFLATDLTV 60 

RIFGSHlJUmilFVA^FPALI^SWFSVLFHNGSWTGLGALSEFNTFVGRIALASFAAYA 120 
RIFG+ LARRIIF VM PALL+SYV S LF+ GSW G GAL+ FN FV RIA ASF AYA 
RI FGAPLARRI I FAVMI PALLI S YVI SSLFYMGSWQGFGALAHFNLFVARI ATAS FMAYA 120 

IGQILDIFVFNKLRRLKAWWIAPNASTVIGHALDT 155 
+GQILD+ VFN+LR+ + WW+AP AST+ G+ DT 
LGQ I LD VHV FNRLRQS RRWWLAPT AST L FGNVS DT 155 

Homology with a predicted ORF from N.menineitidis (strain A) 

ORF66 shows 96.1% identity over a 155aa overlap with an ORF (ORF66a) from strain A of TV. 
meningitidis: 

10 20 30 40 50 60 

orf 66 .pep MYAFTAAQQQKALFRLVLFHILI IAASNYLVQFPFQI FG I HTTWGAFS FPFI FLATDLTV 
IMIiltllMIII I INI I M I , IN I II I I I I I I | | | | | | 1 | | 1 | | | | | f f | | | | | 
orf 66a MYAFTAAQQQKALFWLVLFHI LI IAASNYLVQFPFQI SGIHTTWGAFS FPFI FLATDLTV 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 66 . pep RIFGSHLAR RIIFWVMFPALLLSYVFS VLFHNGSWTGLGALSEFNTFVGRI ALASFAAYA 
I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II IN III! 
orf 66a RIFG5HIJUI RIIFWVMFPALLLSWFSV LFHNGSWTGLGALSEFNTFVGRI ALASFAAYA ' 

70 80 90 100 110 120 

130 140 150 i 

orf 66. pep I GQI LD I F VFNKLRRLKAWW I APN AS T VI GHALDT I 
: I I I I I I I I I I I I I I I I I I I : I I : I I I I I t : I I I I I 
orf 66a LGQ I LD I FV FNKLRRLKAWWVAPTAS TVIGNALDTLVFFAVAFY AS S DG FMAANWQG I AF 

130 140 150 160 170 180 

orf 66a VDYLFKLT VCGLFFLPAYGVILNLL TKKLTTLQTKQAQDRPAPSLONPX 
190 200 210 220 

The complete length ORF66a nucleotide sequence <SEQ ID 263> is: 

1 ATGTACGCAT TTACCGCCGC ACAGCAACAG AAGGCACTCT TCTGGCTGGT 

51 GCTTTTTCAT ATCCTCATCA TCGCCGCCAG CAACTATCTG GTGCAGTTCC 

101 CCTTCCAAAT TTCCGGCATC CACACCACTT GGGGCGCGTT TTCCTTTCCC 

151 TTCATCTTCC TCGCCACCGA CCTGACCGTC CGCATTTTCG GTTCGCACTT 

201 GGCACGGCGG ATTATCTTTT GGGTCATGTT CCCCGCCCTT TTGCTTTCCT 

251 ACGTCTTTTC CGTTTTGTTC CACAACGGCA GTTGGACGGG CTTGGGCGCG 

301 CTGTCCGAAT TCAACACCTT TGTCGGACGC ATCGCGCTGG CAAGTTTTGC 

351 CGCCTACGCG CTCGGACAAA TCCTTGATAT TTTTGTGTTC AACAAATTAC 

401 GCCGTCTGAA AGCGTGGTGG GTTGCCCCGA CTGCATCAAC CGTCATCGGC 

451 AACGCCTTAG ATACGTTGGT ATTTTTCGCC GTTGCCTTCT ACGCAAGCAG 

501 CGATGGATTT ATGGCGGCAA ACTGGCAGGG CATCGCTTTT GTCGATTACC 

551 TGTTCAAACT CACCGTCTGC GGTCTGTTTT TCCTGCCCGC CTACGGCGTG 

601 ATTCTGAATC TGCTGACGAA AAAACTGACG ACCCTGCAAA CCAAACAGGC 

651 GCAAGACCGC CCCGCGCCCT CGCTGCAAAA TCCGTAA 

This encodes a protein having amino acid sequence <SEQ ID 264>: 

1 MYAFTAAQQQ KALFWLVLFH ILIIAASNYL VQFPFQISGI HTTWGAFS FP 

51 FI FLATDLTV RIFGSHLARR IIFWVMFPAL LLSYVFSV LF HNGSWTGLGA 

101 LSEFNTFVGR I ALASFAAYA LGQILDIFV F NKLRRLKAWW VAPTAS TVIG 

151 NALDTLVFFA VAF YASSDGF MAANWQGIAF VDYLFKL TVC GLFFLPAYGV 

201 ILNLLTKKLT TLQTKQAQDR PAPSLQNP* 

ORF66a and ORF66-1 show 97.8% identity in 228 aa overlap: 

10 20 30 40 50 60 

orf 66a . pep MYAFTAAQQQKALFWLVLFHILI IAASNYLVQFPFQI SGIHTTWGAFSFPFI FLATDLTV 
1 I M 1 1 1 1 I I I I I i I I I I I I I I I I I I I I I I I I I I I I IIIMIIIillllHMIIIII 
orf 66-1 MYAFTAAQQQKALFRLVLFHILI IAASNYLVQFPFQI FG I HTTWGAFSFPFI FLATDLTV 
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pep VDYLFKLTVCGLFFLPAYGVIMLLTKKLTTLOTKQAQDRP^S^NPX 

■F^TVCTLFFLPAYGVILNLIjTKKLTTLQTKQAQDRPAPSLQNPX 
190 200 210 220 
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tw^lnpv wit h a predicted OP * from M fimarrhoeae 

ORF66shows 94.20/c identity over a 155aa overlap with a predicted ORF (ORF66.ng) from N. 
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orf66ng 

orf66.pep 

orf66ng 

or£66.pep 

orf66ng 



RIFGSHI^IIF^FPALU,SYVFSVLFHNGSWTG^ 

IGQ I LD I FVFNKLRRLKAWW I APNASTVI GHALDT 

:| II Mill I: I I 1 1 1 Ml I II I 
LGQILDIFVFDKLRRLKAWWIAR 



I I II I III I : JJ 1 1 j J_| JJ.i i i,^^^i^^DTLVFEAVAFYASSDEFMA7UWQGIAF 
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60 
120 
120 
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180 



The complete length ORF66ng nucleotide sequence <SEQ ID 265> is: 



1 ATGTACGCAT 

51 GCTTTTCCAT 

101 CCTTCCGGAT 

151 TTCATCTTCC 

201 GGCGCGGCGG 

251 aCGTCTTTTC 

301 ctgTCCCAAT 

351 CGCCTACGCG 

401 GCCGTCTGAA 

451 AATGCACTGG 

501 CGATGAATTT 

551 TGTTCAAACT 

601 ATACTGAATC 

651 GCAAGACCGC 



TGACCGCCGC 

ATCCTCATCA 

TTTCGGCATC 

TCGCCACCGA 

ATTATCTTTT 

CGTTTTGTTC 

TCAACACCTT 

CTCGGACAAA 

AGCGTGGTGG 

ACACGTTAGT 

ATGGCGGCAA 

TACCGTCTGC 

TGCTGACGAA 

CCCGTGCCCT 



ACAGCAACAG 
TCGCCGCCAG 
CACACCACTT 
CCTGACCGTC 
GGGTGATGTT 
CACAACGGCA 
TGTCGGACGC 
TCCTTGATAT 
ATTGCCCCGG 
ATTTTTTGCC 
ACTGGCAGGG 
ACCCTCTTCT 
AAAACTGACG 
CGCTGCAAAA 



AAGGCACTCT 
CAACTATCTG 
GGGGCGCGTT 
CGCATTTTCG 
CCCCGCCCTT 
GTTGGACGGG 
ATCGCGCTGG 
TTTCGTATTC 
CCGCATCAAC 
GTTGCCTTTT 
CATCGCTTTT 
TCCTGCCCGC 
GCCCTGCAAA 
TCCGTAA 



TCCGGCTGGT 

GTGCAGTTCC 

TTCCTTTCCC 

GTTCGCACTT 

ttgCTTTcat 

CTTGGGCGCG 

CAAGTTTTGC 

GACAAATTAC 

CGTCATCGGC 

ACGCAAGCAG 

GTCGATTACC 

CTACGGCGTG 

CCAAACAGGC 



60 



This encodes a protein having amino acid sequence <SEQ ID 266>: 

1 MYALTAAQQQ KALFRLVLFH ILIIAASNYL VQFPFRIFGI HTTWGA|§£| 

51 SSSS? R IFGSHLARR IIFWVMFPAL SLSYVFSVLF ^GSWTGLGA 

ini pq nFMT FVGR I ALASFAAYA LGQILDIFVF DKLRRLKAWW IAPAASTVIG 

XI SIwFA S S M AANWQGI AF VDYLFKLTVC TL FFLPAYGV 

201 ILNLLTKKLT ALQTKQAQDR PVPSLQNP* 

An alternative annotated sequence is: 

1 MYAT.TAAOOO KALFRL VLFH TLIIAASNYL VQ FPFRIFGI HTTWGAFSFP 

51 SiItolS SgSHLARR IIEWME^^ ^ ISSr 

101 Oo^TFVGR IALASF AAYA LGQILDIFV F DKLRRLKAWW IAPAASTVIG 

^ SSgg V AFY ASSDEF MAANWQGIA F VDYLFKLT VC TLFFLPAYGV 

201 ILNLL TKKLT~A^TKQAQDR PVPSLQNP* 
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ORF66ng and ORF66-1 show 96.1% identity in 228 aa overlap: 

orf 66-1 .pep MYAFTAAQQQKALFRLVLFHILIIAASNYLVQFPFQIFGIHTTWGAFSFPFI FLATDLTV 60 

111:111 I I I I I I M I I I I I I M MM I II I I Ml: I I I I I I I I I I I i I I I I I I I I I I I I 
orf66ng MYALTAAQQQKALFRLVLFHILIIAASNYLVQFPFRIFGIHTTWGAFSFPFI FLATDLTV 60 

orf 66-1 .pep RI FGSHLARRI I FWVMFPALLLS YVFSVLFHNGSWTGLGALSE FNT FVGRI ALAS FAAYA 120 

I M 1 I I f 1 1 1 I 1 1 I M I I M I 1 1 I 1 1 1 1 ) M 1 1 1 1 I 1 1 It I I : M I M M f I t 1 II 1 1 I I 
orf66ng RI FGSHLARRI I FWVMFPALLLS YVFSVLFHNGSWTGLGALSQ FNT FVGRI ALAS FAAYA 120 

10 orf 66-1. pep IGQ I LDI FVFNKLRRLKAWW I APTASTVIGNALDTLVFFAVAFYAS S DG FMAANWQG I AF 180 

: I I I I I II II : II I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 
orf66ng LGQI LDI FVFDKLRRLKAWW I APAASTV IGNALDTLVFFAVAFYAS S DE FMAANWQG I AF 180 

or f 66-1 . pep VDYLFKLTVCTL FFL PAYGVI LNLLTKKLTTLQTKQAQDRPAP S LQN PX 229 
15 I I I I I I I I M M I I I I I I I I I I M I I I I I I : i I I I I ! I ! I I : I I I M | l< 

or f 6 6ng VDYLFKLTVCTL FFL PAYGVI LNLLTKKLTALQTKQAQDRPVPSLQNPX 22 9 

Furthermore, ORF66ng shows significant homology with an E.coli ORF: 

sp|P37619|YHHQ_EC0LI HYPOTHETICAL 25.3 KD PROTEIN IN FTSY-NIKA INTERGENIC 
REGION (0221) 

20 >gi|1073495lpir||S47690 hypothetical protein o221 - Escherichia coli >gi|466607 

(U00039) No definition line found [Escherichia coli] >gi 1 1789882 (AE000423) 
hypothetical 25.3 kD protein in ftsY-nikA intergenic region [Escherichia coli] 
Length = 221 \ 
Score - 273 bits (692), Expect « 5e-73 

25 Identities = 132/203 (65%), Positives - 155/203 (76%) 
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Query: 1 MYALTAAQQQKALFRLVLFHILI IAASNYLVQFPFRIFGIHTTWGAFSFPFI FLATDLTV 60 

M + Q+ KALF L LFH+L+I +SNYLVQ PIG HTTWGAFSFPFI FLATDLTV 
Sbjct: 1 MNVFSQTQRYKALFWLSLFHLLVITSSNYLVQLPVSILGFHTTWGAFSFPFI FLATDLTV 60 

Query: 61 RI FGSHLARRI I FWVMFPALLLSYVFSVLFHNGSWTGLGALSQFNT FVGRI ALAS FAAYA 120 

RIFG+ LARRIIF VM PALL+SYV S LF+ GSW G GAL+ FN FV RIA ASF AYA 
Sbjct: 61 RI FGAPLARRI IFAVMI PALLI SYVISSLFYMGSWQGFGALAHFNLFVARIATASFMAYA 120 



35 Query: 121 LGQI LD I FVFDKLRRLKAWWIAPAASTVIGNALDTLVFFAVAFYASSDE FMAANWQG I AF 180 

LGQILD+ VF++LR+ + WW+AP AST+ GN DTL FF +AF+ S D FMA +W IA 
Sbjct: 121 LGQILDVHVFNRLRQSRRWWLAPTASTLFGNVSDTLAFFFIAFWRSPDAFMAEHWMEIAL 180 

Query: 181 VDYLFKLTVCTLFFLPAYGVILN 203 
40 VDY FK+ + +FFLP YGV+LN 

Sbjct: 181 VDYCFKVLIS IVFFLPMYGVLLN 203 

Based on this analysis, including the homology with the E.coli protein and the presence of several 
putative transmembrane domains in the gonococcal protein, it is predicted that these proteins from 
45 N. meningitidis and N [gonorrhoeae, and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 

Example 32 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 267>: 

1 ATGGTCATAA AATATACAAA TTTGAATTTT GCGAAATTGT CGATAATTGC 

50 51 AATTTTGATG ATGTATTCGT TTGAAGCGAA TGCAAAyGCA GTmwrAATAT 

101 CTGAAACTGT TTCAGTTGAT ACCGGACAAG GTGCGAAAAT TCATAAGTTT 

151 GTACCTAAAA ATAGTAAAAC TTATTCATCT GATTTAATAA AAACGGTAGA 

201 TTTAACACAC AyyCCTACGG GCGCAAAAGC CCGAATCAAC GCCAAAATAA 

251 CCGCCAGCGT ATCCCGCGCC GGCGTATTGG CGGGGGTCGG CAAACTTGCC 

55 301 CGCTTAGgCG CGAAATTCAG CACAAGGGCG GTtCCCTATG TCGGAACAGC 

351 CcTTTTAGCC CACGACGTAT ACGAAAcTTT CAAAGAAGAC ATACAGGCAC 

401 GAGGCTACCA ATACGACCCC GAAACCGACA AATTTGTAAA AGGCTACGAA 

451 TATAGTAATT GCCTTTGGTA CGAAGACAAA AGACGTATTA ATAGAACCTA 
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501 TGGCTGCTAC GGCGTTGAT. . 

This corresponds to the amino acid sequence <SEQ ID 268; ORF72>: 

1 MVIKYTNLNF AKLSIIAILM MYSFEANANA VXISETVSVD TGQGAKIHKF 

51 VPKNSKTYSS DLIKTVDLTH XPTGAKARIN AKITASVSRA GVLAGVGKLA 

101 RLGAKFSTRA VPYVGTALLA HDVYETFKED IQARGYQYDP ETDKFVKGYE 

151 YSNCLWYEDK RRINRTYGCY GVD. . 

Further work revealed the complete nucleotide sequence <SEQ ID 269>: 

1 ATGGTCATAA AATATACAAA TTTGAATTTT GCGAAATTGT CGATAATTGC 

51 AATTTTGATG ATGTATTCGT TTGAAGCGAA TGCAAATGCA GTAAAAATAT 

101 CTGAAACTGT TTCAGTTGAT ACCGGACAAG GTGCGAAAAT TCATAAGTTT 

151 GTACCTAAAA ATAGTAAAAC TTATTCATCT GATTTAATAA AAACGGTAGA 

201 TTTAACACAC ATCCCTACGG GCGCAAAAGC CCGAATCAAC GCCAAAATAA 

251 CCGCCAGCGT ATCCCGCGCC GGCGTATTGG CGGGGGTCGG CAAACTTGCC 

301 CGCTTAGGCG CGAAATTCAG CACAAGGGCG GTTCCCTATG TCGGAACAGC 

351 CCTTTTAGCC CACGACGTAT ACGAAACTTT CAAAGAAGAC ATACAGGCAC 

401 GAGGCTACCA ATACGACCCC GAAACCGACA AATTTGCAAA GGTCTCAGGC 

451 TAA 

This corresponds to the amino acid sequence <SEQ ID 270; ORF72-l>: 

1 MVIKYTNLNF AKLSIIAILM MYSFEANA NA VKISETVSVD TGQGAKIHKF 
51 VPKNSKTYSS DLIKTVDLTH IPTGAKARIN AKITASVSRA GVLAGVGKLA 
101 RLGAKFSTRA VPYVGTALLA HDVYETFKED IQARGYQYDP ETDKFAKVSG 
151 * 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF72 shows 98.0% identity over a 147aa overlap with an ORF (ORF72a) from strain A of N. 
meningitidis: 



or f 7 2. pep 
orf72a 

orf72.pep 
orf72a 



10 20 30 40 50 60 
MVIKYTNLNFA KLS 1 1 AI LMMYS FEANAN AVXI SETVSVDTGQGAKIHKFVPKNSKT YS S 
1 || MM HIM II I I MM I III! I 1 I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I 
MVTKYTN LNFAKLSIIAILMMYSFEANAN AVKI SETVSVDTGQGAKIHKFVPKNSKTYS S 
10 20 30 40 50 60 

70 80 90 100 110 120 

DLIKTVDLTHXPTGAKARINAKITASVSRAGVLAGVGKLARLGAKFSTRAVPYVGTALLA 

| 1 1 1 1 1 1 1 1 I llllllinilMMMIIIIIIIIIMMMMIIMIIIIIIIMII 
DLIKTVDLTHIPTGAKARINAKITASVSRAGVLAGVGKLARLGAKFSTRAVPYVGTALLA 
70 80 90 100 110 120 



130 140 150 160 170 

or f 7 2 pep HDVYETFKEDIQARGYQYDPETDKFVKGYEYSNCLWYEDKRRINRTYGCYGVD 

1 ! I I I I t I I I I t I I I I I M I I I S 1 1 : I 
orf72a HDVYETFKED I QARGYQYDPETDKFAKVSGX 

130 140 150 

The complete length ORF72a nucleotide sequence <SEQ ID 27 1> is: 

1 ATGGTCATAA AATATACAAA TTTGAATTTT GCGAAATTGT CGATAATTGC 

51 AATTTTGATG ATGTATTCGT TTGAAGCGAA TGCAAATGCA GTAAAAATAT 

101 CTGAAACTGT TTCAGTTGAT ACCGGACAAG GTGCGAAAAT TCATAAGTTT 

151 GTACCTAAAA ATAGTAAAAC TTATTCATCT GATTTAATAA AAACGGTAGA 

201 TTTAACACAC ATCCCTACGG GCGCAAAAGC CCGAATCAAC GCCAAAATAA 

251 CCGCCAGCGT ATCCCGCGCC GGCGTATTGG CGGGGGTCGG CAAACTTGCC 

301 CGCTTAGGCG CGAAATTCAG CACAAGGGCG GTTCCCTATG TCGGAACAGC 

351 CCTTTTAGCC CACGACGTAT ACGAAACTTT CAAAGAAGAC ATACAGGCAC 

401 GAGGCTACCA ATACGACCCC GAAACCGACA AATTTGCAAA GGTCTCAGGC 

451 TAA 

This encodes a protein having amino acid sequence <SEQ ID 272>: 
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1 MVIKYTNLNF AKLSIIAILM MYSFEANAN A VKISETVSVD TGQGAKIHKF 
51 VPKNSKTYSS DLIKTVDLTH IPTGAKARIN AKITASVSRA GVLAGVGKLA 
101 RLGAKFSTRA VPYVGTALLA HDVYETFKED IQARGYQYDP ETDKFAKVSG 
151 * 

ORF72a and ORF72-1 show 100.0% identity in 150 aa overlap: 

10 20 30 40 50 60 

orf 72a . pep MVIKYTNLNFAKLSIIAILMMYSFEANANAVKISETVSVDTGQGAKIHKFVPKNSKTYSS 
I I I I I ! I I I 1 I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 72-1 MVIKYTNLNFAKLSIIAILMMYSFEANANAVKISETVSVDTGQGAKIHKFVPKNSKTYSS 

10 20 30 40 50 60 

70 80 , 90 100 110 120 

orf 72a . pep DLIKTVDLTHIPTGAKARINAKITASVSRAGVLAGVGKLARLGAKFSTRAVPYVGTALLA 
I I I I I I I I I I I ! I I II I I II I I I I i I I I I I I I I II I I I I I I I I I I I I I N I I I I II I I I I 
orf 72-1 DLIKTVDLTH I PTGAKARINAKITASVS RAG VLAGVGKLARLGAKFSTRAVPYVGTALLA 

70 80 90 100 110 120 

130 140 150 

or f 72a . pep HDVYETFKEDIQARGYQYDPETDKFAKVSGX 

I I I M I I I I I I I I 1 I I II I I I I I I I I I I M I 
orf 72-1 HDVYET FKED I QARGYQYDPET DKFAKV S GX 

130 140 150 

Homology with a predicted ORF from N. gonorrhoeae ! 

ORF72 shows 89% identity over a 173aa overlap with a predicted ORF (ORF72.ng) from N. 

gonorrhoeae: \ 

orf72 .pep MVIKYTNLNFAKLSIIAILMMYSFEANANAVXISETVSVDTGQGAKIHKFVPKNSKTYSS 6o| 

II I : I I I I I I I I I I I I I I I I I I I I I I I 1 II I I I I : I I I I I I I I I : I I I I II : I : I I I 
orf72ng MVTKHTNLNFAKLSIIAILMMYSFEANANAVKISETLSVDTGQGAKVHKFVPKSSNIYSS 60 

orf 72 . pep DLIKTVDLTHXPTGAKARINAKITASVSRAGVLAGVGKLARLGAKFSTRAVPYVGTALLA 120 

II 1:11111 I I I I I I II I I I I I I I I I I I I II : I I I I I : I I I I I : I I I I I I I I I I I I I 
or f 7 2ng DLTKAVDLTHI PTGAKARINAKITASVSRAGVLSGVGKLVRQGAKFGTRAVPYVGTALLA 120 

orf72 .pep HDVYETFKEDIQARGYQYDPETDKFVKGYEYSNCLWYEDKRRINRTYGCYGVD 173 

I II I I I I I I I I I I I I : I I I I II I I I I I I I I : I I I II I I : I I I I I I I II I I I I 
orf72ng HDVYET FKEDI QARGCRYDPETDKFVKGYEYANCLWYEDERRINRTYGCYGVDSS IMRLM 180 

An ORF72ng nucleotide sequence <SEQ ID 273> was predicted to encode a protein having amino 
acid sequence <SEQ ID 274>: 

1 MVTKHTNLNF AKLSIIAILM MYSFEANA NA VKISETLSVD TGQGAKVHKF 

51 VPKSSNIYSS DLTKAVDLTH IPTGAKARIN AKITASVSRA GVLSGVGKLV 

101 RQGAKFGTRA VPYVGTALLA HDVYETFKED IQARGCRYDP ETDKFVKGYE 

151 YANCLWYEDE RRINRTYGCY GVDSS IMRLM PDRSRFPEVK QLMESQMYRL 

201 ARPFWNWRKE ELNKLSSLDW NNFVLNRCTF DWNGGGCAVN KGDDFRAGAS 

251 FSLGRNPKYK EEMDAKKPEE ILSLKVDADP DKYIEATGYP GYSEKVEVAP 

301 GTKVNMGPVT DRNGNPVQVA ATFGRDAQGN TTADVQVIPR PDLTPASAEA 

351 PHAQPLPEVS PAENPANNPD PDENPGTRPN PEPDPDLNPD ANPDTDGQPG 

401 TSPDSPAVPD RPNGRHRKER KEGEDGGLSC DYFPEILACQ EMGKPSDRMF 

451 HDISIPQVTD DKTWSSHNFL PSNGVCPQPK TFHVFGRQYR ASYEPLCVFA 

501 EKI RFAVLLA FIIMSAFWF G SLGGE* 

After further analysis, the following gonococcal DNA sequence <SEQ ID 275> was identified: 

1 ATGGTCACAA AACATACAAA TTTGAATTTT GCGAAATTGT CGATAATTGC 

51 AATTTTGATG ATGTATTCGT TTGAAGCGAA TGCAAATGCA GTAAAAATAT 

101 CTGAAACTCT TTCGGTTGAT ACCGGACAAG GCGCGAAAGT TCATAAGTTC 

151 GTTCCTAAAT CAAGTAATAT TTATTCATCT GATTTAACAA AAGCGGTAGA 

201 TTTAACGCAT ATCCCCACGG GCGCAAAAGC CCGAATCAAC GCCAAAATAA 

251 CCGCCAGCGT ATCCCGCGCC GGCGTATTGT CGGGGGTCGG CAAACTTGTC 

301 CGCCAAGGCG CGAAATTCGG CACAAGGGCG GTTCCCTATG TCGGAACAGC 

351 CCTTTTAGCC CACGACGTAT ACGAAACTTT CAAAGAAGAC ATACAGGCAC 

401 GAGGCTGCCG ATACGATCCC GAAACCGACA AATTT 
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This corresponds to the amino acid sequence <SEQ ID 276; ORF72ng-l>: 

1 MVTKHTNLNF AKLSIIAILM MYSFEANANA VKISETLSVD TGQGAKVHKF 
5 T VPKSSNIYSS DLTKAVDLTH IPTGAKARIN AKITASVSRA GVLSGVGKLV 
101 RQGAKFGTRA VPYVGTALLA HDVYETFKED IQARGCRYDP ETDKF 

5 ORF72ng-l and ORF721-1 show 89.7% identity in 145 aa overlap: 

10 20 30 40 50 60 

^f12na-l pe MVTKHTNLNFAKLSIIAILMMYSFEANANAVKISETLSVDTGQGAKVHKFVPKSSNIYSS 
orf72ng l.pe , , , , , , , , , , , , , , , , , , , , , , , . , , , , | 1 1 1 1 s 1 1 I I II : 1 : 1 1 1 

orr?2-l iilkiiiiF^ 

° r 10 20 30 40 50 60 

70 80 90 100 110 120 

nrf72na-l pe DLTKAVDLTHIPTGAKARINAKITASVSRAGVLSGVGKLVRQGAKFGTRAVPYyGTALLA 
orf72ng l.pe dltka . . ( , , , . , , , , , , , , , , , , , 

1 S orf 72-1 DLIKTVDLTHI pTGAKARINAKITMVSIUVGVLAGVGKIiAKUM 

10 7 q QQ 90 100 110 120 

130 140 
orf 72nq-l .pe HDVYETFKEDIQARGCRYDPETDKF 

20 iiimi ill Mini mi i mm 

orf72-l HDVYETFKEDIQARGYQYDPETDKFAKVSGX 
° 130 140 150 

Based on this analysis, including the presence of a putative leader sequence and transmembrane 
25 domains in the gonococcal protein, it is predicted that the proteins from N.meningitidis and 
^gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

Example 33 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 277>: 

, ft i ATGAGATTTT TCGGTATCGG TTTTTTGGTG CTGCTGTTTT TGGAGATTAT 

^ U ci QTCGATTGTG TGGGTTGCCG ATTGGCTGGG CGGCGGCTGG ACGTTGTTTT 

101 TGATGGCGGC AGGTTTTGCC GCCGGCGTGC TGATGCTCAG GCAAACCGGG 

151 GCTGACCGGT CTTTTATTGG CGGGCGCGGC AATGAGAAGC GGCGGGAAGG 

201 TATCCGTTTA TCAGATGTTG TGGCCTATC . . 

35 This corresponds to the amino acid sequence <SEQ ID 278; ORF73>: 

1 MRFFGIGFLV LLFLEIMSIV WVADWLGGGW TLFLMAAGFA AGVLMLRQTG 
51 LTGLLLAGAA MRSGGKVSVY QMLWPI.. 

Further work revealed the complete nucleotide sequence <SEQ ID 279>: 

1 ATGAGATTTT TCGGTATCGG TTTTTTGGTG CTGCTGTTTT TGGAGATTAT 

A(\ 51 GTCGATTGTG TGGGTTGCCG ATTGGCTGGG CGGCGGCTGG ACGTTGTTTT 

^ U 101 TGATGGCGGC AGGTTTTGCC GCCGGCGTGC TGATGCTCAG GCATACGGGG 

151 CTGTCCGGTC TTTTATTGGC GGGCGCGGCA ATGAGAAGCG GCGGGAGGGT 

201 ATCCGTTTAT CAGATGTTGT GGCCTATCCG TTATACGGTG GCGGCTGTGT 

251 GTCTGATGAG TCCGGGATTC GTATCCTCGG TGTTGGCGGT ATTGCTGCTG 

AC 301 CTGCCGTTTA AGGGAGGGGC AGTGTTGCAG GCAGGAGGTG CGGAAAATTT 

W 351 TTTCAACATG AACCAATCGG GCAGAAAAGA GGGCTTTTCC CGCGATGACG 

401 ATATTATCGA GGGAGAATAT ACGGTTGAAG AGCCTTACGG CGGCAATCGT 

451 TCCCGAAACG CCATCGAACA CAAAAAAGAC GAATAA 

This corresponds to the amino acid sequence <SEQ ID 280; ORF73-l>: 

5Q l MRFFGIGFLV LLFLEIMSIV WVAD WLGGGW TLFLMAAGFA ^^RHTG 

PU 51 L3CLLLfiCI\A tSSGGRVSVY hmt.wptpvtv ARVCLMSPGF VSSVLAVLLL 

101 LPFKGGAVLQ AGGAENFFNM NQSGRKEGFS RDDDIIEGEY TVEEPYGGNR 
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151 SRNAIEHKKD E* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from Kmeninzitidis (strain A) 

ORF73 shows 90.8% identity over a 76aa overlap with an ORF (ORF73a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

or f 7 3 . pep MRFFG I G FLVLL FLE IMS I VWV ADWLGGGWT L FLMAAG FAA GVLMLRQTGLTGLLLAGAA 
I I I I I I I I I II I I I I I II I I I II I { I It M I I I I I M I I I I I : I I I : I I !: I I I II I I I 
or f 7 3a MRFFGIGFLVLLFLEIMSIVWVADWLGGGWTLFLMAATFAA GWMLRHTGLSGLLLAGAA 

10 20 30 40 50 60 

70 

or f 7 3 . pep MRSGGKVSVYQMLWPI 
IMIMMM III I 

orf 7 3a MRSGGRVSVYXMLWXIRYTVAAV CXMS PGFVS SVXAVLLXL PFKGGAVLQAGGAENFFNM 

The complete length ORF73a nucleotide sequence <SEQ ID 28 1> is: 

1 ATGAGATTTT TCGGTATCGG TTTTTTGGTG CTGCTGTTTT TGGAGATTAT 

51 GTCGATTGTG TGGGTTGCCG ATTGGTTGGG CGGCGGTTGG ACGCTGTTTC ! 

101 TAATGGCGGC AACCTTTGCC GCCGGCGTGG TGATGCTCAG GCATACGGGG 

151 CTGTCCGGTC TTTTATTGGC GGGCGCGGCA ATGAGAAGCG GCGGGAGGGT 

201 ATCCGTTTAT CANATGTTGT GGCNTATCCG TTATACGGTG GCGGCGGTGT 

251 GTCNGATGAG TCCGGGATTC GTATCCTCGG TGTOGGCGGT ATTGCTGNTG . 

301 CTNCCGTTTA AGGGAGGTGC AGTGTTGCAG GCAGGAGGTG CGGAAAATTT 1 

351 TTTCAACATG AACCANTCGG GCAGAAAAGA NGGCNTTTCC CGCGATGACG 

401 ATATTATCGA GGGGGAATAT ACGGTTGAAG ANCCTTACGG CGGCANTCGT I 

451 TTCCGAAACG CCNTNGAACA CAAAAAAGAC GAATAA 

This encodes a protein having amino acid sequence <SEQ ID 282>: 

1 MRFFGIGFLV LLFLEIMSIV WVADWLGGGW TLFLMAAT FA AGWMLRHTG 

51 LSGLLLAGAA MRSGGRVSVY XMLWXIRYTV AAVC XMSPGF VSSVXAVLLX 

101 LPFKGGAVLQ AGGAENFFNM NXSGRKXGXS RDDDIIEGEY TVEXPYGGXR 

151 FRNAXEHKKD E* 

ORF73a and ORF73-1 show 91.3% identity in 161 aa overlap 

10 20 30 40 50 60 

orf 73a . pep MRFFGIGFLVLLFLEIMSIVWVADWLGGGWTLFLMAATFAAGWMLRHTGLSGLLLAGAA 

I I I I I I II I I I I I I I I I I I II II I I I I I I I I II I I I I I I I I I : I I I M I I I i I I I I II I 
orf 73-1 MRFFGIGEXVLLFLEIMSIVWVADWLGGGWTLFLMAAGFAAGVLMLRHTGLSGLLLAGAA 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 73a . pep MRSGGRVSVYXMLWXIRYTVAAVCXMS PGFVS SVXAVLLXLPFKGGAVLQAGGAENFFNM 

II I I I I I 1 I I III I I I I I I I I I I I I I I I I I I MM I I I I II I II I II I M I I I I I 
orf 73-1 MRSGGRVSVYQMLWP IRYTVAAVCLMS PGFVS SVLAVLLLLPFKGGAVLQAGGAEN FFNM 

70 80 90 100 110 120 

130 140 150 160 

orf 73a .pep NXSGRKXGXSRDDDIIEGEYTVEXPYGGXRFRNAXEHKKDEX 
I till I I II I II I II II I 1 I MM I Ml II M I I I 
or f 7 3- 1 NQSGRKEGFSRDDDI IEGEYTVEEP YGGNRSRNAIEHKKDEX 

130 140 150 160 

Homology with a predicted ORF from N. gonorrhoeae 

ORF73 shows 92.1% identity over a 76aa overlap with a predicted ORF (ORF73.ng) from N. 
gonorrhoeae: 



or f 7 3 . pep MRFFG IGFLVLLFLEIMS I VWVADWLGGGWTLFLMAAG FAAGVLMLRQTGLTGLLLAGAA 60 
I I I M II M II I M I I II I II II I II I M II It M II . M I M II I I : II I : M I II II I 
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orf73ng MRFFGIGFLVLLFIiEIMSIVWVADWliGGGWTLFIiMRATFAAGVLMLRHTGLSGLLLAGAA 60 

76 

orf73.pep MRSGGKVSVYQMLWPI 

orf73ng wUG^iw^iHRYTVAAVCMSPGFVSSVIAVLLLLPFKGGAVLQAGGAENFFNM 120 

The complete length ORF73ng nucleotide sequence <SEQ ID 283> is: 

1 ATGAGATTTT TCGGTATCGG TTTTTTGGTG CTGCTGTTTT TGGAAATTAT 

cn GTCGATTGTG TGGGTTGCCG ATTGGCTGGG CGGCGGTTGG AcgcTGTTTC 

101 TAATGGCGGC AACCTTTGCC GCCGGTGTGC TGATGCTCAG GCATAcggGG 

151 CTGTCCGGTC TTTTATTGGC TGGCGCGGCG GTAAAAagta gtgGGAAGGT 

901 ATCTGTTTAT CagatgtTGT GGCCTATCCG TTATAcggtg gcggcggtgT 

251 CTCTGatgag tCcggGATTC GTATCCTccg tgttggCGGT ATTGCTGCTG 

301 CTGCcgttta aggGaggGgc agtgttgcag gcaggaggtg cggaaaATTT 

351 TTTCAACATg aaCcaatcgg gcagaaAaga gggatttttc cacgatgacg 

401 atattatcga gggagaatat acggttgaaa aacctgacgg cggcaatcgt 

451 tcccgaAAcg ccatcgaaca cgaaaAagac gaataA 

This encodes a protein having amino acid sequence <SEQ ID 284>: 

1 MPPPr.TfiFT. V LLFLEIMSIV WVRDWLGGGW TLFLMAATFA AGVLMLRHTG 

5 I LSGLLLAGAA VKSSGKVSVY QMLW PIRYTV AAVCL MSPGF VSSVLAVLLL 

101 LPFKGGAVLQ AGGAENFFNM NQSGRKEGFF HDDDIIEGEY TVEKPDGGNR 

151 SRNAIEHEKD E* 

ORF73ng and ORG73-1 show 93.8% identity in 161 aa overlap 

10 20 30 40 50 60 

«rf73-l oet> MRFFGIGFLVLLFIXIMSIVWVADWLGGGWTLFLMAAGFAAGV1MLRHTGLSGLLLAGAA 

orf73na ^FFGIGFLVLLFLEIMSIVWVADWLGGGWTLFLMAATF^ 

°" y !0 20 30 40 50 60 

70 80 90 100 110 120 

r>rf73-l Dep MRSGGRVSVYQMLWPIRYTVAAVCLMSPGFVSSVIAVLLLLPFKGGAVLQAGGAENFFNM 
orf73 l.pep " « | ) | | | 1 1 1 | | | 1 1 1 1 1 I I I I I I I I I I I I I i I 1 1 I I 1 1 I M 1 1 I 1 1 I 1 1 
orf73na ^ssgctSVYQMLWPIRYTVAAVCLMSPGFVSS 

ut y 70 80 90 100 HO 12U 

130 140 150 160 

orf73-l Pep NQSGRKEGFSRDDDIIEGEYTVEEPYGGNRSRNAIEHKKDEX 

I II III II I : I 1 I I 1 1 1 1 I I M : 1 II III III! II: I Ml 
orf73na NQSGRKEGFFHDDDIIEGEYTVEKPDGGNRSRNAIEHEKDEX 
y 130 140 150 160 

Based on this analysis, including the presence of a putative leader sequence and putative 
transmembrane domain in the gonococcal protein, it is predicted that the proteins from 
^meningitidis and N.gonorrhoeae, and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 

Example 34 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 285>: 

1 ATGTTTGTTT TTCAGACGGC ATTCTT.ATG TTTCAGAAAC ATTTGCAGAA 

51 AGCCTCCGAC AGCGTCGTCG GAGGGACATT ATACGTGGTT GCCACGCCCA 

101 TCGGCAATTT GGCGGACATT ACCCTGCGCG CTTTGGCGGT ATTGCAAAAG 

101 TCGGCAATTT GbU AGACACGCGC GTTACCGCAC AGCTTTTGAG 

\ 201 CGCGTACGGC ATTCAGGGCA AACTCGTCAG TGTGCGCGAA CACAACGAAC 

251 GGCAGATGGC GGACAAGATT GTCGGCTATC TTTCAGACGG CATGGTTGTG 

301 GCACAGGTTT CCGATGCGGG TACGCCGGCC GTGTGCGACC CGGGCGCGAA 

111 ACTOGCCCGC CGCGTGCGTG AGGCCGGGTT TAAAGTCGTT CCCGTCGTGG 

401 GCGCAAC GC GGTGATGGCG GCTTTGAGCG TGGCCGGTGT GGAAGGATCC 

j 451 GATTTTTATT TCAACGGTTT TGTACCGCCG AAATCGGGAG AACGCAGGAA 
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501 ACTGTTTGCC AAATGGGTGC GGGCGGCGTT TCCTATCGTC ATGTTTGAAA 

551 CGCCGCACCG CATCGGTGCA GCGCTTGCCG ATATGGCGGA ACTGTTCCCC 

601 GAACGCCGAT TAATGCTGGC GCGCGAAATT ACGAAAACGT TTGAAACGTT 

651 CTTAAGCGGC ACGGTTGGGG AAATTCAGAC GGCATTGTCT GCCGACGGCG 

701 ACCAATCGCG CGGCGAGATG GTGTTGGTGC TTTATCCGGC GCAGGATGAA 

751 AAACACGAAG GCTTGTCCGA GTCCGCGCAA AACATCATGA AAATCCTCAC 

801 AGCCGAGCTG CCGACCAAAC AGGCGGCGGA GCTTGCTGCC AAAATCACGG 

851 GCGAGGGAAA GAAAGCTTTG TACGAT. . 

This corresponds to the amino acid sequence <SEQ ID 286; ORF75>: 



1 MFVFQTAFXM FQKHLQKASD SWGGTLYW ATPIGNLADI TLRALAVLQK 

51 A AEDTR VTAQLLSAYG IQGKLVSVRE HNERQMADKI VGYLSDGMW 

101 AQVSDAGTPA VCDPGAKLAR RVREAGFKW PWGAXAVMA ALSVAGVEGS 

151 DFYFNGFVPP KSGERRKLFA KWVRAAFPIV MFETPHRIGA ALADMAELFP 

201 ERRLMLAREI TKTFETFLSG TVGEIQTALS ADGDQSRGEM VLVLYPAQDE 

251 KHEGLSESAQ NIMKILTAEL PTKQAAELAA KITGEGKKAL YD. . 

Further work revealed the complete nucleotide sequence <SEQ ID 287>: 



1 ATGTTTCAGA AACATTTGCA GAAAGCCTCC GACAGCGTCG TCGGAGGGAC 

51 ATTATACGTG GTTGCCACGC CCATCGGCAA TTTGGCGGAC ATTACCCTGC 

101 GCGCTTTGGC GGTATTGCAA AAGGCGGACA TCATCTGTGC CGAAGACACG 

151 CGCGTTACCG CACAGCTTTT GAGCGCGTAC GGCATTCAGG GCAAACTCGT 

201 CAGTGTGCGC GAACACAACG AACGGCAGAT GGCGGACAAG ATTGTCGGCT 

251 ATCTTTCAGA CGGCATGGTT GTGGCACAGG TTTCCGATGC GGGTACGCCG ! 

301 GCCGTGTGCG ACCCGGGCGC GAAACTCGCC CGCCGCGTGC GTGAGGCCGG 

351 GTTTAAAGTC GTTCCCGTCG TGGGCGCAAG CGCGGTGATG GCGGCTTTGA 

401 GCGTGGCCGG TGTGGAAGGA TCCGATTTTT ATTTCAACGG TTTTGTACCG 

451 CCGAAATCGG GAGAACGCAG GAAACTGTTT GCCAAATGGG TGCGGGCGGC 

501 GTTTCCTATC GTCATGTTTG AAACGCCGCA CCGCATCGGT GCGACGCTTG I 

551 CCGATATGGC GGAACTGTTC CCCGAACGCC GATTAATGCT GGCGCGCGAA 

601 ATTACGAAAA CGTTTGAAAC GTTCTTAAGC GGCACGGTTG GGGAAATTCA I 

651 GACGGCATTG TCTGCCGACG GCAACCAATC GCGCGGCGAG ATGGTGTTGG 

701 TGCTTTATCC GGCGCAGGAT GAAAAACACG AAGGCTTGTC CGAGTCCGCG 

751 CAAAACATCA TGAAAATCCT CACAGCCGAG CTGCCGACCA AACAGGCGGC 

801 GGAGCTTGCT GCCAAAATCA CGGGCGAGGG AAAGAAAGCT TTGTACGATC 

851 TGGCTCTGTC TTGGAAAAAC AAATAG 

This corresponds to the amino acid sequence <SEQ ID 288; ORF75-l>: 

1 MFQKHLQKAS DSWGGTLYV VATPIGNLAD ITLRALAVLQ KADIICAEDT 

51 RVTAQLLSAY GIQGKLVSVR EHNERQMADK IVGYLSE)GMV VAQVSDAGTP 

101 AVCDPGAKLA RRVREAGF KV VPWGASAVM AALSVA GVEG SDFYFNGFVP 

151 PKSGERRKLF AKWVRAAFPI VMFETPHRIG ATLADMAELF PERRLMLARE 

201 ITKTFETFLS GTVGEIQTAL SADGNQSRGE MVLVLYPAQD EKHEGLSESA 

251 QNIMKILTAE LPTKQAAELA AKITGEGKKA LYDLALSWKN K* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from Kmeninzitidis (strain A) 

ORF75 shows 95.8% identity over a 283aa overlap with an ORF (ORF75a) from strain A of M 
meningitidis: 

10 20 30 40 50 60 

or f 7 5 . pep MFVFQTAFXMFQKHLQKASDSWGGTLYWATPIGNLADITLRALAVLQKAXXXXAEDTR 

I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 
or f 7 5a MFQKHLQKAS DSWGGTLYWATPIGNLADITLRALAVLQKADI I CAE DTR 

10 20 30 40 50 



70 80 90 100 110 120 

or f 7 5 . pep VTAQLLSAYG IQGKLVSVREHNERQMADKIVGYLS DGMWAQVS DAGT PAVCDPGAKLAR 
MIMII 1 III IIIMMI 111 II Ml III! MMMI I IN I I I 111 I MMIIIM ii 
or f 7 5a VTAQLLSAYG I QGKLVS VREHNERQMADKI VG YLS DGMWAQVS DAGT PAVC DPGAKLAR 

60 70 80 90 100 110 

130 140 150 160 170 180 

or f 7 5. pep RVREAGF KWPWGAXAVMAALSVA GVEGSDFYFNGFVPPKSGERRKLFAKWVRAAPPTV 



WO 99/24578 

orf75a 

orf75.pep 
orf75a 

or f 7 5. pep 
orf75a 
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1 1 it- II 1 1 1 1 1 1 II I! Ill 111 lit M I M M I 1 1 M M I M 1 1 1 1 1 1 1 M 1 1 M = r 
120 130 140 150 160 170 



160 
230 



190 200 210 220 230 240 

180 190 200 210 



220 



230 



290 



250 260 270 280 

VLVLYPAQDEKHEGLSESAQNIMKILTAEIjPTKQAAELAAKITGEGKKALYD 

II I II III II N I II I MM I Ml II II III I Mill IMM II I II I MM 
240 250 260 270 280 



or f 7 5a 



The complete length ORF75a nucleotide sequence <SEQ ID 289> is: 
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i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



ATGTTTCAGA 
ATTATACGTG 
GCGCTTTGGC 
CGCGTTACCG 
CAGCGTGCGC 
ATCTTTCAGA 
GCCGTGTGCG 
GTTTAAAGTT 
GTGTGGCTGG 
CCGAAATCGG 
GTTTCCCGTC 
CCGATATGGC 
ATCACGAAAA 
GACGGCATTG 
TGCTTTATCC 
CAAAACATCA 
GGAGCTTGCC 
TGGCACTGTC 



AACATTTGCA 
GTTGCCACGC 
GGTATTGCAA 
CGCAGCTTTT 
GAACACAACG 
CGGCATGGTT 
ACCCGGGCGC 
GTCCCTGTTG 
TGTGGCGGGA 
GCGAACGTAG 
GTGATGTTTG 
GGAACTGTTC 
CGTTTGAAAC 
GCGGCGGACG 
GGCGCAGGAT 
TGAAAATCCT 
GCCAAAATCA 
TTGGAAAAAC 



GAAAGCCTCC 
CCATCGGCAA 
AAGGCGGACA 
GAGCGCGTAC 
AACGGCAGAT 
GTGGCACAGG 
GAAACTCGCC 
TCGGCGCAAG 
TCCGATTTTT 
GAAATTGTTT 
AAACGCCGCA 
CCCGAACGCC 
GTTCTTAAGC 
GCAACCAATC 
GAAAAACACG 
CACAGCCGAG 
CGGGCGAGGG 
AAATGA 



GACAGCGTCG 

TTTGGCGGAC 

TCATCTGTGC 

GGCATTCAGG 

GGCGGACAAG 

TTTCCGATGC 

CGCCGCGTGC 

CGCGGTGATG 

ATTTCAACGG 

GCCAAATGGG 

CCGCATCGGG 

GATTAATGCT 

GGCACGGTTG 

GCGCGGCGAG 

AAGGCTTGTC 

CTGCCGACCA 

AAAAAAAGCT 



TCGGAGGGAC 

ATTACCCTGC 

CGAAGACACG 

GCAAACTCGT 

ATTGTCGGCT 

GGGTACGCCG 

GTGAGGTCGG 

GCGGCTTTGA 

TTTTGTACCG 

TGCGGGTGGC 

GCGACGCTTG 

GGCGCGCGAA 

GGGAAATTCA 

ATGGTGTTGG 

CGAGTCCGCG 

AACAGGCGGC 

TTGTACGATC 
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This encodes a protein having amino acid sequence <SEQ ID 290>: 

1 MFOKHLOKAS DSWGGTLYV VATPIGNLAD ITLRALAVLQ KADIICAEDT 

5? SSlSy GIQGKLVSVR EHNERQMADK IVGYLSDGMV VAQVSDAGTP 

101 AVCDPGAKLA ppvp^F KV VPWGASAVM AALSVA GVAG SDFYFNGFVP 

151 PKSGERRKLF AKWVRVAFPV VMFETPHRIG ATLADMAELF PERRLMLARE 

Hi SfetSs GTVGEIQTAL AADGNQSRGE MVLVLYPAQD EKHEGLSESA 

251 QNIMKILTAE LPTKQAAELA AKITGEGKKA LYDLALSWKN K* 

ORF75a and ORF75-1 show 98.3% identity in 291 aa overlap: 

10 20 30 40 
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orf75-l ii^liiis^WM^ 
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or f 7 5a. pep 
orf75-l 



70 80 90 100 HO 120 

GIQGKLVSVREHNERQMADKIVGYLSDGMWAQVSDAGTPAVCDPG^ 
IIMIIMIIIIMMMMMMMIMIMI MMMMMIMIMMMIMMII 

~" -j 0 eo 90 ioo no " u 

130 140 150 160 170 180 

orf75a.pep VPWGASAVMAALSVAGVAGSDFYFNGFVP^ 

ori y F i I i i I i I I I I I I I I I I I I I M I II II M M M II I M I : I M : I M I M I I I I 

130 140 150 160 170 180 



orf75-l 



or f 7 5a. pep m 



230 



240 



190 200 210 220 

ATUU>MAELFPEREL^ 

IIIIHIIIMIIMIMMIIIMMIMIMMMIIMIMMIIMMIMIMM 
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601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 



AACAAAcgca 
CGGacGCGCc 
TCAGCACCAC 
tcgacCGCCC 
GATGTGGCGC 
CCGTGCCGCT 
TTGATTGCCA 
TTTTGAAGCC 
CTATGCACAT 
AGTATGCCCA 
GAAAGgcgGA 



cgctcgaATT 
gactaCAATC 
GCCCAAacTT 
AAcTGATTGG 
ATCTCGCTGA 
TTCCTATTTC 
TCGGTTTGTT 
GTGGAAGACG 
CATCATGTTC 
GCCAGCCCTT 
AAATGA 



GCGCCACGGC 
AGGTTtcctt 
ATCGaccCCG 
CAGCAGCAAT 
CCGTCAGCGT 
AACCCGCGCA 
TTTAATTTAC 
GCAAAATCCA 
GTCATCGCAA 
CTGGCAGGCG 



TACCGTTACA 
cCAAAAacTc 
TTTCCCACCG 
CCGCAACATC 
CCTCCTGCTC 
GCGGACATAC 
CAAAACGGGC 
TTTTTGGCTC 
TCGTACTTCT 
GTTGGCAAAA 



GCGGcacgcC 
aacctgATta 
CCGCACCATT 
AGGCAGAATT 
TGCCTACTCG 
CTACAATATC 
TGACCCTGCT 
GGACTGCTGC 
GCGCGTCCGC 
GTCTGACATT 



This corresponds to the amino acid sequence <SEQ ID 506; ORF101ng-l>: 



15 



20 



i 

51 
101 
151 
201 
251 
301 



MIYQRNLIKE LSFTAVGIFV VLLAVLVSTQ 



LVGFWVIGMT PLLL VLTAFI 
PVMQ FAVPFA ILIAVMQLWV 
NLGKRNGRVY FVETFDTESG 
NKRTLELRHG YRYSGTPGRA 
STAQLIGSSN PQHQAELMWR 
LIAIGLFLIY QNGLTLLFEA 



STLTVLTRYW 
2PWAELRSRE 
IMKNLFLREQ 
DYNQVSFQKL 
ISLTVSVLLL 



AINLLGRAAD 
RDSEMSVWLS 
YAEILKQKQE 
DKNGGDNIIF 
NLIISTTPKL 
CLLAVPLSYF 



351 SMPSQPFWQA VGKSLTLKGG 



VEDGKIHFWL 
K* 



GLLPMHIIMF 



GRVAIDAVLA 
CGLALKQWIR 
LSLVEAGEFN 
AKEGNFSLKD 
IDPVSHRRTI 
NPRSGHTYNI 
VIAIVLLRVR 



ORFlOlng-1 and ORF101-1 show 97.6% identity in 371 aa overlap: 
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10 20 30 40 50 ! 60 

orf 101-1 . pep MIYQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGRAADGRVAIDAVLALVGFWVIGMT 

I I! II I I Mill IN 11 II II lllli III III I I I I I I I I I I I I I I I I ( I I I I 

orfl01ng-l MIYQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGRAADGRVAIDAVLALVGFWVIGMT 

10 20 30 40 50 60 , 

70 80 90 • 100 110 120 j 

orf 101-1 . pep PLLLVLTAFI STLTVLTRYWRDSEMSVWLSCGLALKQWI RPVMQFAVPFAVLVAVMQLWV I 

Ml I Ml I II II I II II M I I I I I II I II II 111 I M I M I I I M I tlll:|:ilttlll 
orfl01ng-l PLLLVLTAFI STLTVLTRYWRDSEMSVWLSCGLALKQW I RPVMQFAVPFAI LI AVMQLWV 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 101-1 . pep IPWAELRSREYAEILKQKQELSLVEAGEFNSLGKRNGRVYFVETFDTESGIMKNLFLREQ 
I I I I I I I I I I I I I I I I I I I I I I I I I i I I II: I II I I I I I I I I II II M I I I I I I I I I I I I 
Orfl01ng-1 IPWAELRSREYAEILKQKQELSLVEAGEFNNLGKRNGRVYFVETFDTESGIMKNLFLREQ 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 101-1 .pep DKNGGDNIIFAKEGNFSLNDNKRTLELRHGYRYSGTPGRADYNQVSFQKLNLIISTTPKL 
I M I I I I I 1 1 I I I 1 1 i I I : I I 1 1 I I M I I I I I I I I I I I I I I I I I I I I I I M I I II I I I I I 
orfl01ng-l DKNGGDNIIFAKEGNFSLKDNKRTLELRHGYRYSGTPGRADYNQVSFQKLNLIISTTPKL 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 101-1. pep IDPVSHRRTIPTAQLIGSSNPQHQAELMWRISLTVSVLLLCLLAVPLSYFNPRSGHTYNI 
I II Mill M I I III III I I Mil I II MM II I I MM Mill I I III Mil III I II 
orfl01ng-l IDPVSHRRTISTAQLIGSSNPQHQAELMWRISLTVSVLLLCLLAVPLSYFNPRSGHTYNI 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 101-1 .pep LIAIGLFLIYQNGLTLLFEAVEDGKIRFWLGLLPMHIIMFAVALILLRVRSMPSQPFWQA 
I M I M I 1 I I K I 1 M I I I M M I I I I I I I M r I M M I 1 I : t I : : I I I r I I I I M M I I I 
orfl01ng-l LIAIGLFLIYQNGLTLLFEAVEDGKIHFWLGLLPMHIIMFVIAIVLLRVRSMPSQPFWQA 

310 320 330 340 350 360 

370 

orf 101-1 .pep VGKSLTLKGGKX 
MINIMUM 
orfl01ng-l VGKSLTLKGGKX 

370 

Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
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predicted that the proteins from N.meningitidis and N.gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 60 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 507>: 

1 GGTGGTGGTT TTATCAATGC TTCCTGTGCC ACTTTGACGA CAGCCAAACC 

51 "gcaatatcaa GCAGGAGACC TTAGCGCTTT TAAGATAAGG CAAGGCAATG 

101 TTGTAATCGC CGGACACGGT TTGGATGCAC GTGATACCGA TTACACACGT 

151 ATTCTCAGTT ATCATTCCAA AATCGATGCA CCCGTATGGG GACAAGATGT 

201 TCGTGTCGTC GCGGGACAAA ACGATGTGGC CGCAACAGGT GATGCACATT 

251 CGCCTATTCT CAATAATGCT GCTGCCAATA CGTCAAACAA TACAGCCAAC 

301 AACGGCACAC ATATCCCTTT ATTTGCGATT GATACAGGCA AATTAGGAGG 

351 TAT . GTATGC CAACAAAATC ACCTTGATCA GTACGGTCGA GCAAGCAGGC 

401 ATTCGTAA 

This corresponds to the amino acid sequence <SEQ ID 508; ORF1 13>: 

1 c ! GGGFINASCA TLTTAKPQYQ AGDLSAFKIR QGNWIAGHG LDARDTDYTR 

lJ 51 ILSYHSKIDA PVWGQDVRW AGQNDVAATG DAHSPILNNA AANTSNNTAN 

101 NGTHIPLFAI DTGKLGGXVC QQNHLDQYGR ASRHS* 

Computer analysis of this amino acid sequence gave the following results: 
Wnmolo pv with with p s pA putative secreted prote i n of N.menineitidis (accession AF030941) 
20 ORF and pspA show 44% aa identity in 179aa overlap: 

or f 11 3 GGGFINASCATLTTAKPQYQAGDLSAFKIRQGNWIAGHGLDARDTDYTRILS YHSKIDA 60 

GGG INA+ TLT+ P G+L+ F + G WI G GLD D DYTRILS ++I+A 
pspa GGGLIN AAS VT LT S GV P VLNNGN LTG FDVS SGKWI GGKG L DT S DADYTRI L S RAAE IN A 256 

oc orfU3 PVWGQDVRWAGQNDVAATG DAHS PI LXXXXXXXXXXXXXXGTHI PLFAI DTGKLGGMYA 120 
VWG+DV+W+G+N + G + P AIDT LGGMYA 

pspa GVWGKDVKWSGKNKLDFDG S LAKT AS APS S S D S VT PTVAI DTAT LGGMYA 307 

orf 113 NKITLI STVEQAGIRNQGQWFASAGNVAVNAEGKLVNTGMI AATGENHAVS LHARNVHN 17 9 
-5ft +KITLIST A IRN+G+ FA+ G V ++A+GKL N+G I A +++ A+ V N 

DV pspa DKITLI STDNGAVI RNKGR I FAATGGVT LS ADGKLSN SG S I DAA EITISAQTVDN 362 

Mmnolo fv with a predicted OR F from N Gonorrhoeae 

ORF1 13 shows 86.5% identity in 52aa overlap at the N- terminal part and 94.1% identity in 17aa 
35 overlap at the C-terminal part with a predicted ORF (ORF1 13ng) from N. gonorrhoeae: 

« ■* - GGGFINASCATLTTAKPQYQAGDLSAFKIR 30 

° rtli | I | I I I I I 1 1 I I I I I I I I I I : I : I I I I 

orfll3ng SHPSQLNGYIEVGGRRAEWIANPAGIAVNGGGFINASRATLTTGQPQYQAGDFSGFKIR 224 

40 orfll3 QGNWI AGHGLDARDTDYTRILS YHSKI DAPVWGQDVRWAGQNDVAATGDAHS PILNNA 90 

|||:MI If llll 1111:1 IN 
orfll3ng QGNAVIAGHGLDARDTDFTRILVCQQNHLDQYGRTSRHS 263 

nym *<\ 1 ^ IDTGKLGGXVCQQNHLDQYGRASRHS 135 

0rIi I 1 I 1 I I I I i I I 1 : I I I I 

orfll3ng DFSGFKIRQGNAVIAGHGLDARDTDFTRILVCQQNHLDQYGRTSRHS 263 

The complete length ORF113ng nucleotide sequence <SEQ ID 509> is predicted to encode a 
protein having amino acid sequence <SEQ ID 510>: 

1 MNKTLYRVIF NRKRGAWAV AETTKREGKS CADSGSGSVY VKSVSFIPTH 
51 SKAFCFSALG FSLCLALGTV NIAFADGIlT DKAAPKTQQA TILQTGNGIP 



45 



50 
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101 QVNIQTPTSA GVSVNQYAQF DVGNRGAILN NSRSNTQTQL GGWIQGNPWL 
151 TRGEARVWN QINSSHPSQL NGYIEVGGRR AEWIANPAG IAVNGGGFIN 
201 ASRATLTTGQ PQYQAGDFSG FKIRQGNAVI AGHGLDARDT DFTRILVCQQ 
251 NHLDQYGRTS RHS* 

Based on this analysis, it is predicted that these proteins from N.meningitidis and K gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 61 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 51 1>: 

1 ..TCAACGGGAC ATAGCGAACA AAATTACACT TTGCCGCGAG AAATCACACG 

51 CAACATTTCA CTGGGTTCAT TTGCCTATGA ATCGCATCGC AAAGCATTAA 

101 GCCATCATGC GCGCAGCCAA GGCACTGAGT TGCCGCAAAG CAACGGTATT 

151 TCGCTACCCT ATACGTCCAA TTCTTTTACC CCATTACCCA GCAGCAGCTT 

201 ATACATTATC AATCCTGTCA ATAAAGGCTA TCTTGTTGAA ACCGATCCAC 

251 GCTTTGCCAA CTACCGTCAA TGGTTGGGTA GTGACTATAT GCtGGACAGC 

301 CTCAAACTAG ACCCAAACAA TTTACATAAA CGTTTGGGTG ATGGTTATTA 

351 CGAGCAACGT TTAATCAATG AACAAATCGC AGAGCTGACA GGGCATCGTC 

401 GTTTAGAcGG TTATCAAAAC GACGAAGAAC AATTTAAAGC CTTAATGGAT 

451 AATGGCGCGA CTGCGGCACG TTcGATGAAT CTCAGCGTTG GCATTGCATT 

501 AAGTGCCGAG CAAGTAGCGC AACTGACCAG CGATATTGTT TGGTTGGTAC j 

551 AAAAAGAAGT TAAGCTTCCT GATGGCGGCA CACAAACCGT ATTGGTGCCA 

601 CAGGTTTATG TACGCGTTAA AAATGGCGAC ATAGACGGTA AAGGTGCATT 

651 GTTGTCAGGC AGCAATACAC AAATCAATGT TTCAGGCAGC CTGAAAAACT 

701 CAGGCACGAT TGCAGGgCGC AATGCGCTTA TTATCAATAC CGATACGCTA 

751 GACAATATCG GTGGGCGTAT TCATGCGCAA AAATCAGCGG TTACGGCCAC j 

801 ACAAGACATC AATAATATTG GCGGCATGCT TTCTGC.CGAA CAGACATTAT 

851 TGCTCAACGC AGGCAACAAC ATCAACAGCC AAAGCACCAC CGCCAGCAGT | 

901 CAAAATACAC AAGGCAGCAG CACCTACCTA GACCGAATGG CAGGTATTTA 

951 TATCACAGGC AAAGAAAAAG GTGTTT. . 

This corresponds to the amino acid sequence <SEQ ID 512; ORF1 15>: 

1 . . STGHSEQNYT LPREITRNIS LGSFAYESHR KALSHHAPSQ GTELPQSNGI 

51 SLPYTSNSFT PLPSSSLYII NPVNKGYLVE TDPRFANYRQ WLGSDYMLDS 

101 LKLDPNNLHK RLGDGYYEQR LINEQIAELT GHRRLDGYQN DEEQFKALMD 

151 NGATAARSMN LSVGIALSAE QVAQLTSDIV WLVQKEVKLP DGGTQTVLVP 

201 QVYVRVKNGD IDGKGALLSG SNTQINVSGS LKNSGTIAGR NALIINTDTL 

251 DNIGGRIHAQ KSAVTATQDI NNIGGMLSAE QTLLLNAGNN INSQSTTASS 

301 QNTQGSSTYL DRMAGIYITG KEKGV. . 

Computer analysis of this amino acid sequence gave the following results: 

Homology with the pspA putative secreted protein of N.meninzitidis (accession number AF030941) 
ORF1 15 and pspA protein show 50% aa identity in 325aa overlap: 

STGHSEQNYTLPREITRNISLGSFAYESHRKALSHHAPSQGTELPQSNGI SLPYTSNSFT 60 
STG+S Y E++ +1 +G AY+ + + P + NGI +T 
STG YSRS PYE PAPEVS -S I RMG I SAYKG YAPQQAS DI PGTW PWAENG IHPT FT 831 

PLPSSSLYIINPVNKGYLVETDPRFANYRQWLGSDYMLDSLKLDPNNLHKRLGDGYYEQR 120 

LP+SSL+ I P NKGYL+ETDP F +YR+WLGS YML +L+ DPN++HKRLGDGYYEQ+ 
-LPNSSLFAI APNNKGYLIETDPAFTDYRKWLGSGYMLAALQQDPNHIHKRLGDGYYEQK 890 

LINEQIAELTGHRRLDGYQNDEEQFKALMDNGATAARSMNLSVGIALSAEQVAQLTSDIV 180 
L+NEQIA+LTG+RRLDGY NDEEQFKALMDNG T A+ + L+ GIALSAEQVA+LTSDIV 
LVNEQIAKLTGYRRLDGYTNDEEQFKALMDNGITIAKELQLTPGIALSAEQVARLTSDIV 950 

WLVQKEVKLP DGGTQTVLVPQVYVRVKNGDIDGKGALLSGSNTQINVSGSLKN-SGT I AG 239 
WL + V LPDG TQTVL P+VYVR + D++G+GALLSGS I SG+++N G I AG 
WLENETVTLPDGTTQTVLKPKVYVRARPKDMNGQGALLSGSVVDIG-SGAIENRGGLIAG 1009 



OrfllS: 


1 


pspA: 


778 


OrfllS: 


61 


pspA: 


832 


OrfllS: 


121 


pspA: 


891 


OrfllS: 


181 


pspA: 


951 
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Orf 115: 240 RNALI INTDTLDN IGGRIHAQKSAVTATQDINN IGGMLSAEQTLLLNAGXXXXXXXXXXX 299 

R ALI+N +N+G++ ADING+AE LLL A 

pspA: 1010 REALILNAQNIKNLQGDLQGECNIFAAAGSDITNTGS-IGAENALLLKASNNIESRSETRS 1068 

OtfllS: 300 XXXXXXXXXYLDRMAGIYITGKEKG 324 

+ R+AGIY+TG++ G 
pspA: 1069 NQNEQGSVRN IGRVAG I YLTGRQNG 1093 

Homology with a predicted ORF from N.zonorrhoeae 

ORF115 shows 91.9% identity over a 334aa overlap with a predicted ORF (ORFllSng) from 
N. gonorrhoeae: 



15 



20 



25 



30 



35 



orf 115. pep 
orfllSng 
or f 115. pep 
orfll5ng 
orfl!5.pep 
orfll5ng 
orf 115. pep 
orfllSng 
orfll5.pep 
orf 115ng 
orf 11 5. pep 
orf 115ng 
orf 115. pep 
orfllSng 



STGHSEQNYTLPREITRNISLGSFAYESHRK 
111 I I III I I: I I I I : I I M ! M II : I I 
NEQTFGEKKVFSENGKLHNYWRARRKGHDETGHREQNYTLPEEITRDISLGSFAYESHSK 



31 
71 



81 



ALSHHAPSQGTELPQSN GISLPYTSNSFTPLPSSSLYIINPVNKGYLVET 

I I I: I I I I I i I I II I I I I MUM I 1 1 I I I I : I I I I I I I I r i K 1 | I I I | 

ALSRHAPSQGTELPQSNRDN IRTAKSNGI SLPYT PNS FT PLPGS SLY I IN PANKGYLVET 131 

DPRFANYRQWLGSDYMLDSLKLDPNNLHKRLGDGYYEQRLINEQIAELTGHRRLDGYQND 141 
I Ml M II I I M I I I I I II I M I I I I M I II I I I II I I I I I I I I II I I I I I I II I I II I 
DPRFANYRQWLGSDYMLGSLKLDPNNLHKRLGDGYYEQRLINEQIAELTGHRRLDGYQND 191 

EEQFKALMDNGATAARSMNLSVGIALSAEQVAQLTSDIVWLVQKEVKLPDGGTQTVLVPQ 201 
I I II I I I II I I I I I I M I II I I I I I M M I : II M I I I I I I I I I I I I I I I I I M I I I : I I 
EEQFKALMDNGATAARSMNLSVGIALSAEQAAQLTSDIVWLVQKEVKLPDGGTQTVLMPQ 251 

VYVRVKNGDIDGKGALLSGSNTQINVSGSLKNSGTIAGRNALIINTDTLDNIGGRIHAQK 261 
I I I I I I I I II I II I I I I II I I M I I I I I I I I M I I I I I I I I II I I I I I I I I I I I M I I I 
VYVRVKNGGIDGKGALLSGSNTQINVSGSLKNSGTIAGRNALIINTDTLDNIGGRIHAQK 311 

SAVTATQDINNIGGMLSAEQTLLLNAGNNINSQSTTASSQNTQGSSTYLDRMAGIYITGK 321 
I I I II M I I I I I I I: I I I I II M I II I I I M : I I I : I I M : II I M II I I II II I I I I I 
SAVTATQDINNIGGILSAEQTLLLNAGNNINNQSTAKSSQNAQGSSTYLDRMAGIYITGK 371 

EKGV 325 
I I II 

EKGVLAAQAGKDINI IAGQI SNQS DQGQTRLQAGRDINLDTVQTGKYQEIHFDADNHT IR 431 



An ORF1 15ng nucleotide sequence <SEQ ID 5 13> was predicted to encode a protein having amino 



40 acid sequence <SEQ ID 5 14>: 
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55 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 



MLVQTEKDGL 
LPEEITRDIS 
SLPYTPNSFT 
LKLDPNNLHK 
NGATAARSMN 
QVYVRVKNGG 
DNIGGRIHAQ 
QNAQGSSTYL 
RLQAGRDINL 
SGNNLNAKAA 
GNKLVITDKA 
QAGNHVRIGT 
NEHTGSTVGS 
NQLNSKTTQT 
MPWRLPMQVG 



HNEQTFGEKK 
LGSFAYESHS 
PLPGSSLYII 
RLGDGYYEQR 
LSVGIALSAE 
I DGKGALLSG 
KSAVTATQDI 
DRMAGIYITG 
DTVQTGKYQE 
EVGSAKGTLA 
QSHHETAQSS 
TQTQSQSETY 
LKGDTTIVAS 
YEQKGLTVAF 
RLFKQAKAPK 



VFSENGKLHN 
KALSRHAPSQ 
NPANKGYLVE 
LINEQIAELT 
QAAQLTSDIV 
SNTQINVSGS 
NNIGGILSAE 
KEKGVLAAQA 
IHFDADNHTI 
VYAKNDITIS 
TFEGKQWLQ 
HQTQKSGLMS 
KHYEQTGSNV 
SSPVTDLAQQ 
K* 



YWRARRKGHD 
GTELPQSNRD 
TDPRFANYRQ 
GHRRLDGYQN 
WLVQKEVKLP 
LKNSGTIAGR 
QTLLLNAGNN 
GKDINIIAGQ 
RGSTNEVGSS 
SGIHAGQVDD 
AGNDANILGS 
AGIGFTIGSK 
SSPEGNNLIS 
AIAVAHKAAK 



ETGHREQNYT 
N IRTAKSNGI 
WLGSDYMLGS 
DEEQFKALMD 
DGGTQTVLMP 
NAIiIINTDTL 
INNQSTAKSS 
ISNQSDQGQT 
IQTKGDVTLL 
ASKHTGRSGG 
NVISDNGTRI 
TNTQENQSQS 
TQSMDIGAAQ 
QFDKAKTTAL 



Further work revealed the following partial gonococcal DNA sequence <SEQ ID 515>: 



60 



1 TTGCTTGTGC 

51 CGAGAAGAAA 

101 CGCGTCGTAA 

151 TTGCCGGAGG 

201 ATCGCATAGC 

251 TGCCACAAAG 



AAACAGAAAA 
GTCTTCAGCG 
AGGACATGAT 
AAATCACACG 
AAAGCATTAA 
TAACCGGGAT 



AGACGGTTTG 
AAAATGGTAA 
GAAACAGGGC 
CGACATTTCA 
GCCGTCATGC 
AATATCCGTA 



CATAACGAGC 
GTTGCACAAC 
ATCGTGAACA 
CTGGGTTCAT 
GCCCAGCCAA 
CTGCGAAAAG 



AAACCTTTGG 
TACTGGCGTG 
AAATTATACT 
TTGCCTATGA 
GGCACTGAGT 
CAACGGTATT 
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301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 



TCGCTACCCT 
ATACATTATC 
GCTTTGCCAA 
CTCAAACTAG 
CGAGCAACGT 
GTTTAGACGG 
AATGGCGCGA 
AAGTGCCGAG 
AAAAAGAAGT 
CAGGTTTATG 
GTTGTCAGGC 
CAGGCACGAT 
GACAATATCG 
ACAAGACATC 
TGCTCAATGC 
CAAAATGCAC 
TATCACAGGC 
TCAACATCAT 
CGGCTGCAGG 
ATATCAAGAA 
CGAACGAAGT 
TCAGGGAATA 
CACACTTGCC 
ATGCCGGCCA 
GGTAATAAAT 
TCAAAGCAGC 
ATGCCAACAT 
CAAGCAGGCA 
CGAAACCTAT 
GCTTCACTAT 
AACGAACATA 
TGTTGCAAGC 
AGGGCAACAA 
AACCAATTAA 
GGTGGCATTC 
TAGCACACAA 
ATGCCATGGC 
GGCGCACAAA 



ATACGCCCAA 
AATCCTGCCA 
CTACCGTCAA 
ACCCAAACAA 
TTAATCAATG 
TTATCAAAAC 
CTGCGGCACG 
CAAGCAGCGC 
TAAACTTCCT 
TACGCGTTAA 
AGCAATACAC 
TGCAGGGCGC 
GTGGGCGTAT 
AATAATATTG 
GGGTAACAAC 
AAGGTAGCAG 
AAAGAAAAAG 
TGCCGGTCAA 
CAGGACGCGA 
ATCCATTTTG 
CGGCAGCAGC 
ATCTCAATGC 
GTGTATGCTA 
AGTTGATGAT 
TAGTCATTAC 
ACCTTTGAAG 
CCTTGGCAGT 
ATCATGTTCG 
CATCAAACCC 
TGGCAGCAAG 
CAGGCAGTAC 
AAACACTACG 
CCTTATCAGC 
ACAGCAAAAC 
AGTTCGCCCG 
AGCAGCAAAC 
GGCTGCCAAT 
ACTTAG 



TTCTTTTACC 
ATAAAGGCTA 
TGGTTGGGTA 
TTTACATAAA 
AACAAATCGC 
GACGAAGAAC 
TTCGATGAAT 
AACTGACCAG 
GATGGCGGCA 
AAATGGCGGC 
AAATCAATGT 
AATGCGCTTA 
TCATGCGCAA 
GCGGCATTCT 
ATCAACAACC 
CACCTACCTA 
GTGTTTTAGC 
ATCAGCAATC 
CATTAACCTG 
ATGCCGATAA 
ATTCAAACAA 
CAAAGCTGCC 
AAAATGACAT 
GCGTCCAAAC 
CGATAAAGCC 
GCAAGCAAGT 
AATGTTATTT 
CATTGGTACA 
AAAAATCAGG 
ACAAACACAC 
CGTAGGCAGC 
AACAAACCGG 
ACGCAAAGTA 
CACCCAAACC 
TTACCGATTT 
AAGTCGGACA 
GCAGGTTGGC 



CCATTACCCG 
TCTTGTTGAA 
GTGACTATAT 
CGTTTGGGTG 
AGAGCTGACA 
AATTTAAAGC 
CTCAGCGTTG 
CGATATTGTT 
CACAAACCGT 
ATAGACGGTA 
TTCAGGCAGC 
TTATCAATAC 
AAATCAGCGG 
TTCTGCCGAA 
AAAGCACGGC 
GACCGAATGG 
AGCGCAGGCA 
AATCAGATCA 
GATACGGTAC 
CCATACCATC 
AAGGCGATGT 
GAAGTCGGCA 
TACTATCAGC 
ATACAGGCAG 
CAAAGTCATC 
TGTATTGCAG 
CCGATAATGG 
ACCCAAACTC 
ATTGATGAGT 
AAGAAAACCA 
CTGAAAGGCG 
CAGCAACGTT 
TGGATATTGG 
TACGAACAAA 
GGCACAACAA 
AAGCAAAAAC 
AGGCCTATCA 



GCAGCAGCTT 
ACCGATCCAC 
GCTGGGCAGC 
ATGGTTATTA 
GGGCATCGTC 
CTTAATGGAT 
GCATTGCATT 
TGGTTGGTAC 
ATTGATGCCA 
AAGGTGCATT 
CTGAAAAACT 
CGATACGCTA 
TTACGGCCAC 
CAGACATTAT 
CAAGAGCAGT 
CAGGTATTTA 
GGCAAAGACA 
AGGGCAAACC 
AAACCGGCAA 
CGAGGTTCAA 
TACCCtatTG 
GCGCAAAAGG 
TCAGGCATCC 
AAGCGGCGGC 
ACGAAACTGC 
GCAGGAAACG 
CACCCGGATT 
AAAGCCAAAG 
GCAGGTATCG 
ATCCCAAAGC 
ATACCACCAT 
TCCAGCCCTG 
CGCAGCACAA 
AAGGCTTAAC 
GCGATTGCCG 
GACCGCGTTA 
AACAGGCAAA 



This corresponds to the amino acid sequence <SEQ ID 516; ORF1 15ng-l>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 



LLVQTEKDGL 
LPEEITRDIS 
SLPYTPNSFT 
LKLDPNNLHK 
NGATAARSMN 
QVYVRVKNGG 
DNIGGRIHAQ 
QNAQGSSTYL 
RLQAGRDINL 
SGNNLNAKAA 
GNKLVITDKA 
QAGNHVRIGT 
NEHTGSTVGS 
NQLNSKTTQT 
MPWRLPMQVG 



HNEQTFGEKK 
LGSFAYESHS 
PLPGSSLYII 
RLGDGYYEQR 
LSVGIALSAE 
IDGKGALLSG 
KSAVTATQDI 
DRMAGIYITG 
DTVQTGKYQE 
EVGSAKGTLA 
QSHHETAQSS 
TQTQSQSETY 
LKGDTTIVAS 
YEQKGLTVAF 
RPIKQAKAHK 



VFSENGKLHN 
KALSRHAPSQ 
NPANKGYLVE 
LINEQIAELT 
QAAQLTSDIV 
SNTQINVSGS 
NNIGGILSAE 
KEKGVLAAQA 
IHFDADNHTI 
VYAKNDITIS 
TFEGKQWLQ 
HQTQKSGLMS 
KHYEQTGSNV 
SSPVTDLAQQ 
T* 



YWRARRKGHD 
GTELPQSNRD 
TDPRFANYRQ 
GHRRLDGYQN 
WLVQKEVKLP 
LKNSGTIAGR 
QTLLLNAGNN 
GKDINIIAGQ 
RGSTNEVGSS 
SGIHAGQVDD 
AGNDANILGS 
AGIGFTIGSK 
SSPEGNNLIS 
AIAVAHKAAN 



ETGHREQNYT 
NIRTAKSNGI 
WLGSDYMLGS 
DEEQFKALMD 
DGGTQTVLMP 
NALIINTDTL 
INNQSTAKSS 
ISNQSDQGQT 
IQTKGDVTLL 
ASKHTGRSGG 
NVISDNGTRI 
TNTQENQSQS 
TQSMDIGAAQ 
KSDKAKTTAL 



This gonococcal protein (ORF115ng-l) shows 91.9% identity with ORF115 over 334aa: 

20 30 40 50 60 70 

orf 115ng-l . p NEQTFGEKKVFSENGKLHNYWRARRKGHDETGHREQNYTLPEEITRDISLGSFAYESHSK 

Ml I I I I I I I : I I II : M I I I I I H I I I 
orf 115 STGHSEQNYTLPREITRNISLGSFAYESHRK 

10 20 30 



80 90 100 110 120 130 

orf 115ng-l . p ALSRHAPSQGTELPQSNRDNIRTAKSNGISLPYTPNSFTPLPGSSLYIINPANKGYLVET 
II 1:1 I II I II HI I II I II II II I I III 11:111 Ml Ihllll III I 

orf 115 ALSHHAPSQGTELPQSN GISLPYTSNSFTPLPSSSLYIINPVNKGYLVET 

40 50 60 70 80 
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140 150 160 170 180 190 

orf 115ng-l . p DPRFANYRQWLGSDYMLGSLKLDPNNLHKRLGDGYYEQRLINEQIAELTGHRRLDGYQND 
MllillMIIMIMI 1 I I 1 1 t 1 I I I ! i 1 1 1 I I I t i I I I I I I 1 I f I I 1 1 1 1 i 1 1 I I f t 
orf 115 DPRFANYRQWLGSDYMLDSLKLDPNNLHKRLGDGYYEQRLINEQIAELTGHRRLDGYQND 
90 100 110 120 130 140 

200 210 220 230 240 250 

orf 115ng-l . p EEQFKALMDNGATAARSMNLSVGIALSAEQAAQLTSDIVWLVQKEVKLPDGGTQTVLMPQ 
|| IIIIIMIIMMI Mill II II I II 11:1 IHM! I Mill MM IN II IIM:M 
orf 1 15 EEQFKALMDNGATAARSMNLSVGIALSAEQVAQLTSDIVWLVQKEVKLPDGGTQTVLVPQ 

150 160 170 180 190 200 

260 270 280 290 300 310 

orf 115ng-l .p VYVRVKNGGIDGKGALLSGSNTQINVSGSLKNSGTIAGRNALIINTDTLDNIGGRIHAQK 
I | | I | | I I i I I II I I II I I I I I I II I I I I I I I I I I II I I I I I I I I II I I I I I I I I I I I i 
orf 115 VYVRVKNGDIDGKGALLSGSNTQINVSGSLKNSGTIAGRNALIINTDTLDNIGGRIHAQK 
210 220 230 240 250 260 

320 330 340 350 360 370 

orfll5ng-l.p SAVTATQDINNIGGILSAEQTLLLNAGNNINNQSTAKSSQNAQGSSTYLDRMAGIYITGK 
I I II I I I I I I I I I I : I II I I II I I I I I I I I I: I I I : I I I I : I I II I I I I I I II II I I I I 
orf 115 SAVTATQDINNIGGMLSAEQTLLLNAGNKINSQSTTASSQNTQGSSTYLDRMAGIYITGK 

270 280 290 300 .310 320 

380 390 400 410 420 430 

orf 115ng-l . p EKGVLAAQAGKDINIIAGQISNQSDQGQTRLQAGRDINLDTVQTGKYQEIHFDADNHTIR 
Mil 

orfllS EKGV 

In addition, it shows homology with a secreted N. meningitidis protein in the database: 

gi 1 2623258 (AF030941) putative secreted protein [Neisseria meningitidis] Length 
- 2273 

Score - 604 bits (1541), Expect = e-172 

Identities = 325/678 (47%), Positives = 449/678 (65%), Gaps - 22/678 (3%) 

Query: 1 LLVQTEKDGLHNEQTFGEKKVFSENGKLHNYWRARRKGHDETGHREQNYTLPEEITRDIS 60 

L+V T + L N++T G K + ++ G LH Y R +KG D TG+ Y E++ I 
Sbjct: 739 LIVGTPESALDNDETLGTKTI-TDKGDLHRYHRHHKKGRDSTGYSRSPYEPAPEVS-SIR 796 

Qtery: 61 LGSFAYESHSKALSRHAPSQGTELPQSNRDNIRTAKSNGISLPYTPNSFTPLPGSSLYII 120 

+G AY+ + AP Q +++P + + NGI +T LP SSL+ I 

Sbjct: 797 MGISAYKGY APQQASDIPGTV VPWAENGIHPTFT LPNSSLFAI 840 

Query: 121 NPANKGYLVETDPRFANYRQWLGSDYMLGSLKLDPNNLHKRLGDGYYEQRLINEQIAELT 180 

P NKGYL+ETDP F +YR+WLGS YML +L+ DPN++HKRLGDGYYEQ+L+NEQIA+LT 
Sbjct: 841 APNNKGYLIETDPAFTDYRKWLGSGYMLAALQQDPNHIHKRLGDGYYEQKLVNEQIAKLT 900 

Query: 181 GHRRLDGYQNDEEQFKALMDNGATAARSMNLSVG IALSAEQAAQLTS DI VWLVQKEVKLP 240 

G+RRLDGY NDEEQFKALMDNG T A+ + L+ GIALSAEQ A+LTSDIVWL + V LP 
Sbjct: 901 GYRRLDGYTNDEEQFKALMDNGITIAKELQLTPGIALSAEQVARLTSDIVWLENETVTLP 960 

Query: 241 DGGTQTVXMPQVYVRVKNGGIDGKGALLSGSNTQINVSGSLKN-SGTIAGRNALIINTDT 299 

DG TQTVL P+VYVR + ++G+GALLSGS I SG+++N G IAGR ALI+N 
Sbjct: 961 DGTTQTVLKPKVYVRARPKDMNGQGALLSGSWDIG-SGAIENRGGLIAGREALILNAQN 1019 

Query: 300 LDNIGGRIHAQKSAVTATQDINNIGGILSAEQTLLLNAGNNINNQSTAKSSQNAQGSSTY 359 

+ N+G++ ADINGIAE LLL A NNI ++S +S+QN QGS 

Sbjct: 1020 IKNLQGDLQGKNIFAAAGSDITNTGSI-GAENALLLKASNNIESRSETRSNQNEQGSVRN 1078 

Query: 360 LDRMAGIYITGKEKGVLAAQAGKDINIIAGQISNQSDQGQTRLQAGRDINLDTVQTGKYQ 419 

+ R+AGIY+TG++ G + AG +1 + A +++NQS+ GQT L AG DI DT + Q 
Sbjct: 1079 IGRVAGIYLTGRQNGSVLLDAGNNIVLTASELTNQSEDGQTVLNAGGDIRSDTTGISRNQ 1138 

Query: 420 EIHFDADNHTIRGSTNEVGSSIQTKGDVTLLSGNNLNAKAAEVGSAKGTLAVYAKNDITI 479 

FD+DN+ IR NEVGS+I+T+G+++L + ++ +AAEVGS +G L + A DI + 
Sbjct: 1139 NTIFDSDNYVIRKEQNEVGSTIRTRGNLSLNAKGDIRIRAAEVGSEQGRLKLAAGRDIKV 1198 

Query: 480 SSGIHAGQVDDASKHTGRSGGGNKLVITDKAQSHHETAQSSTFEGKQWLQAGNDANILG 539 

+G + +DA K+TGRSGGG K +T + AST +GK+++L +G D + G 

Sbjct: 1199 EAGKAHTETEDALKYTGRSGGGIKQKMTRHLKNQNGQAVSGTLDGKEIILVSGRDITVTG 1258 
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Query: 540 SNVISDNGTRIQAGNHVRIGTTQTQSQSETYHQTQKSGLM-SAGIGFTIGSKTNTQENQS 598 

SN+I+DN T + A N++ + +T+S+S ++ +KSGLM S GIGFT GSK +TQ N+S 
Sbjct: 1259 SNIIADNHTILSAKNNIVLKAAETRSRSAEMNKKEKSGLMGSGGIGPTAGSKKDTQTNRS 1318 

5 Query: 599 QSNEHTGSTVGSLKGDTTIVASKHYEQTGSNVSSPEGNNLISTQSMDIGAAQNQLNSKTT 658 

++ HT S VGSL G+T I A KHY QTGS +SSP+G+ IS+ + I AAQN+ + ++ 
Sbjct: 1319 ETVSHTES WGSLNGNTLI SAGKHYTQTGST I SS PQGDVGI S SGKI S I DAAQNRYSQESK 1378 

Query: 659 QTYEQKGLTVAFSSPVTD 676 
10 Q YEQKG+TVA S PV + 

Sbjct: 1379 QVYEQKGVTVAI SVPWN 1396 

Based on this analysis, it is predicted that the proteins from N.meningitidis and Mgonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 62 

1 5 The following partial DNA sequence was identified in N.meningitidis <SEQ ID 5 1 7>: 

1 ..TCAGGGAATA ACCTCAATGC CAAAGCTGCC GAAGTCAGCA GCGCAAACGG 

51 TACACTCGCT GTGTCTGCCA ATAATGACAT CAACATCAGC GCAGGCATCA 

101 ACACGACCCA TGTTGATGAT GCGTCCAAAC ACACAGGCAG AAGCGGTGGT 

151 GGCAATAAAT TAGTCATTAC CGATAAAGCC CAAAGTCATC ACGAAACCGC 

20 201 CCAAAGCAGC ACCTTTGAAG GCAAGCAAGT TGTATTGCAG GCAGGAAACG 1 

251 ATGCCAACAT CCTTGGCAGC AATGTTATTT CCGATAATGG CACCCAGATT 

301 CAAGCAGGCA ATCATGTTCG CATTGGTACA ACCCAAACTC AAAGCCAAAG 

351 CGAAACCTAT CATCAAACCC AGAAATCAGG ATTGATGAGT GCAGGTATCG 

401 GCTTCACTAT TGGCAGCAAG ACAAACACAC AAGAAAACCA ATCCCAAAGC j 

25 4 51 AACGAACATA CAGGCAGTAC CGTAGGCAGC TTGAAAGGCG ATACCACCAT ! 

501 TGTTGCAGGC AAACACTACG AACAAATCGG CAGTACCGTT TCCAGCCCGG . 

551 AAGGCAACAA TACCATCTAT GCCCAAAGCA TAGACATTCA AGCGGCACAC 1 

601 AACAAATTAA ACAGTAATAC CACCCAAACC TATGAACAAA AAGG.CTAAC 

651 GGTGGCATTC AGTTCGCCCG TTACCGATTT GGCACAACAA . . . 

30 This corresponds to the amino acid sequence <SEQ ID 518; ORF1 17>: 

1 . . SGNNLNAKAA EVSSANGTLA VSANNDINIS AGINTTHVDD ASKHTGRSGG 

51 GNKLVTTDKA QSHHETAQSS TFEGKQWLQ AGNDANILGS NVISDNGTQI 

101 QAGNHVRIGT TQTQSQSETY HQTQKSGLMS AGIGFTIGSK TNTQENQSQS 

151 NEHTGSTVGS LKGDTTIVAG KHYEQIGSTV SSPEGNNTIY AQSIDIQAAH 

35 201 NKLNSNTTQT YEQKXLTVAF SSPVTDLAQQ . . . 

Computer analysis of this amino acid sequence gave the following results: 

Homology with the pspA putative secreted protein of N.menimitidis (accession number AF03094n 
ORF1 17 and pspA protein show 45% aa identity in 224aa overlap: 

Orfll7: 4 NLNAKAAEVSSANGTLAVSANNDINISAGINTTHVDDASKHTGRSGGGNKLVITDKAQSH 63 
40 ++ +AAEV S G h ++A DI + AG T +DA K+TGRSGGG K +T ++ 

pspA: 1173 DIRIRAAEVGSEQGRLKLAAGRDIKVEAGKAHTETEDAIJCy'TGRSGGGIKQKMTRHLKNQ 1232 

Orfll7: 64 HETAQSSTFEGKQWLQAGNDANILGSNVISDNGTQIQAGNHVRIGTTQTQSQSETYHQT 123 
+ A S T +GK+++L +G D + GSN+I+DN T + A N++ + +T+S+S ++ 
45 pspA: 1233 NGQAVSGTLDGKEIILVSGRDITVTGSNIIADNHTILSAKNNIVLKAAETRSRSAEMNKK 1292 

Orfll7: 124 QKSGLM-SAGIGFTIGSKTNTQENQSQSNEHTGSTVGSLKGDTTIVAGKHYEQIGSTVSS 182 
+KSGLM S GIGFT GSK +TQ N+S++ HT S VGSL G+T I AGKHY Q GST+SS 
^ pspA: 1293 EKSGLMGSGGIGFTAGSKKDTQTNRSETVSHTESWGSLNGNTLISAGKHYTQTGSTISS 1352 

Orfll7: 183 PEGNNT I YAQS I D IQAAHNKLN SNTTQT YEQKXLTVAFSS PVTD .226 

P+G+ 1+ IIAAN++ +Q YEQK +TVA S PV + 
pspA: 1353 PQGDVGI S SGKI S I DAAQNRYSQESKQVYEQKGVTVAI SVPWN 1396 
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Homology with a predicted ORF from N. gonorrhoeae 

ORF117 shows 90% identity over a 230aa overlap with a predicted ORF (ORF117ng) from 
N. gonorrhoeae: 

orf 117 . pep SGNNLNAKAAEVSSANGTLAVSANNDINIS 30 

5 I i I I II I I I I I I : I I : I | | M I : I I I : I I 

orfll7ng IHFDADNHTIRGSTNEVGSSIQTKGDVTLLSGNNLNAKAAEVGSAKGTIAVYAKNDITIS 480 

orf 117 .pep AGINTTHVDDASKHTGRSGGGNKLVITDKAQSHHETAQSSTFEGKQWLQAGNDANILGS 90 
HI:: :lllllll!IIIIIIIMtllllllllllllltlllillllllllllllillll 
10 orf!17ng SGIHAGQVDDASKHTGRSGGGNKLVITDKAQSHHETAQSSTFEGKQWLQAGNDANILGS 540 

orf 117 . pep NVISDNGTQIQAGNHVRIGTTQTQSQSETYHQTQKSGLMSAGIGFTIGSKTNTQENQSQS 150 

Mill 111:1 1MIIIIMI! I III II IIIIIIMMMIMIM Ml I I IIIMilll I 
orfll7ng NVISDNGTRIQAGNHVRIGTTQTQSQSETYHQTQKSGLMSAGIGFTIGSKTNTQENQSQS 600 

orf 117 .pep NEHTGSTVGSLKGDTT I VAGKHYEQIGSTVS S PEGNNT I YAQS I DIQAAHNKLNSNTTQT 210 

M I M M I M I II M I I I I: II I M iMIMIIMI I :||:|| I I : I : I I I : I I I I 
orfll7ng NEHTGSTVGSLKGDTTIVASKHYEQTGSNVSSPEGNNLISTQSMDIGAAQNQLNSKTTQT 660 

20 orf 117. pep YEQKXLTVAFSS PVTDLAQQ 230 

MM I II II I II I II I I I I 

orfll7ng YEQKGLTVAFS S PVT DLAQQAIAVAHKAAKQFDKAKTTALMPWRLPMQVGRLFKQAKAPK 720 

An ORF1 17ng nucleotide sequence <SEQ ID 5 19> was predicted to encode a protein having amino 
acid sequence <SEQ ID 520>: 

25 1 . .LLVQTEKDGL HNEQTFGEKK VFSENGKLHN YWRARRKGHD ETGHREQNYT 

51 LPEEITRDIS LGSFAYESHS KALSRHAPSQ GTELPQSNRD NIRTAKSNGI 

101 SLPYTPNSFT PLPGSSLYII NPANKGYLVE TDPRFANYRQ WLGSDYMLGS 

151 LKLDPNNLHK RLGDGYYEQR LINEQIAELT GHRRLDGYQN DEEQFKALMD 

201 NGATAARSMN LSVGIALSAE QAAQLTSDXV WLVQKEVKLP DGGTQTVLMP 

30 251 QVYVRVKNGG IDGKGALLSG SNTQINVSGS LKNSGTIAGR NALIINTDTL 

301 DNIGGRIHAQ KSAVTATQDI NNIGGILSAE QTLLLNAGNN INNQSTAKSS 

351 QNAQGSSTYL DRMAGIYITG KEKGVLAAQA GKDINIIAGQ ISNQSDQGQT 

401 RLQAGRDINL DTVQTGKYQE IHFDADNHTI RGSTNEVGSS IQTKGDVTLL 

451 SGNNLNAKAA EVGSAKGTLA VYAKNDITIS SGIHAGQVDD ASKHTGRSGG 

35 501 GNKLVITDKA QSHHETAQSS TFEGKQWLQ AGNDANILGS NVISDNGTRI 

551 QAGNHVRIGT TQTQSQSETY HQTQKSGLMS AGIGFTIGSK TNTQENQSQS 

601 NEHTGSTVGS LKGDTTIVAS KHYEQTGSNV SSPEGNNLIS TQSMDIGAAQ 

651 NQLNSKTTQT YEQKGLTVAF SS PVTDLAQQ AIAVAHKAAK QFDKAKTTAL 

701 MPWRLPMQVG RLFKQAKAPK K* 

40 Further work revealed the following gonococcal partial DNA sequence <SEQ ID 521>: 

1 TTGCTTGTGC AAACAGAAAA AGACGGTTTG CATAACGAGC AAACCTTTGG 

51 CGAGAAGAAA GTCTTCAGCG AAAATGGTAA GTTGCACAAC TACTGGCGTG 

101 CGCGTCGTAA AGGACATGAT GAAACAGGGC ATCGTGAACA AAATTATACT 

151 TTGCCGGAGG AAATCACACG CGACATTTCA CTGGGTTCAT TTGCCTATGA 

45 201 ATCGCATAGC AAAGCATTAA GCCGTCATGC GCCCAGCCAA GGCACTGAGT 

251 TGCCACAAAG TAACCGGGAT AATATCCGTA CTGCGAAAAG CAACGGTATT 

301 TCGCTACCCT ATACGCCCAA TTCTTTTACC CCATTACCCG GCAGCAGCTT 

351 ATACATTATC AATCCTGCCA ATAAAGGCTA TCTTGTTGAA ACCGATCCAC 

401 GCTTTGCCAA CTACCGTCAA TGGTTGGGTA GTGACTATAT GCTGGGCAGC 

50 451 CTCAAACTAG ACCCAAACAA TTTACATAAA CGTTTGGGTG ATGGTTATTA 

501 CGAGCAACGT TTAATCAATG AACAAATCGC AGAGCTGACA GGGCATCGTC 

551 GTTTAGACGG TTATCAAAAC GACGAAGAAC AATTTAAAGC CTTAATGGAT 

601 AATGGCGCGA CTGCGGCACG TTCGATGAAT CTCAGCGTTG GCATTGCATT 

651 AAGTGCCGAG CAAGCAGCGC AACTGACCAG CGATATTGTT TGGTTGGTAC 

55 701 AAAAAGAAGT TAAACTTCCT GATGGCGGCA CACAAACCGT ATTGATGCCA 

751 CAGGTTTATG TACGCGTTAA AAATGGCGGC ATAGACGGTA AAGGTGCATT 

801 GTTGTCAGGC AGCAATACAC AAATCAATGT TTCAGGCAGC CTGAAAAACT 

851 CAGGCACGAT TGCAGGGCGC AATGCGCTTA TTATCAATAC CGATACGCTA 

901 GACAATATCG GTGGGCGTAT TCATGCGCAA AAATCAGCGG TTACGGCCAC 

60 951 ACAAGACATC AATAATATTG GCGGCATTCT TTCTGCCGAA CAGACATTAT 

1001 TGCTCAATGC GGGTAACAAC ATCAACAACC AAAGCACGGC CAAGAGCAGT 

1051 CAAAATGCAC AAGGTAGCAG CACCTACCTA GACCGAATGG CAGGTATTTA 
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1101 TATCACAGGC AAAGAAAAAG GTGTTTTAGC AGCGCAGGCA GGCAAAGACA 

1151 TCAACATCAT TGCCGGTCAA ATCAGCAATC AATCAGATCA AGGGCAAACC 

1201 CGGCTGCAGG CAGGACGCGA CATTAACCTG GATACGGTAC AAACCGGCAA 

. 1251 ATATCAAGAA ATCCATTTTG ATGCCGATAA CCATACCATC CGAGGTTCAA 

1301 CGAACGAAGT CGGCAGCAGC ATTCAAACAA AAGGCGATGT TACCCtatTG 

1351 TCAGGGAATA ATCTCAATGC CAAAGCTGCC GAAGTCGGCA GCGCAAAAGG 

1401 CACACTTGCC GTGTATGCTA AAAATGACAT TACTATCAGC TCAGGCATCC 

1451 ATGCCGGCCA AGTTGATGAT GCGTCCAAAC ATACAGGCAG AAGCGGCGGC 

1501 GGTAATAAAT TAGTCATTAC CGATAAAGCC CAAAGTCATC ACGAAACTGC 

1551 TCAAAGCAGC ACCTTTGAAG GCAAGCAAGT TGTATTGCAG GCAGGAAACG 

1601 ATGCCAACAT CCTTGGCAGT AATGTTATTT CCGATAATGG CACCCGGATT 

1651 CAAGCAGGCA ATCATGTTCG CATTGGTACA ACCCAAACTC AAAGCCAAAG 

1701 CGAAACCTAT CATCAAACCC AAAAATCAGG ATTGATGAGT GCAGGTATCG 

1751 GCTTCACTAT TGGCAGCAAG ACAAACACAC AAGAAAACCA ATCCCAAAGC 

1801 AACGAACATA CAGGCAGTAC CGTAGGCAGC CTGAAAGGCG ATACCACCAT 

1851 TGTTGCAAGC AAACACTACG AACAAACCGG CAGCAACGTT TCCAGCCCTG 

1901 AGGGCAACAA CCTTATCAGC ACGCAAAGTA TGGATATTGG CGCAGCACAA 

1951 AACCAATTAA ACAGCAAAAC CACCCAAACC TACGAACAAA AAGGCTTAAC 

2001 GGTGGCATTC AGTTCGCCCG TTACCGATTT GGCACAACAA GCGATTGCCG 

2051 TAGCACACAA AGCAGCAAAC AAGTCGGACA AAGCAAAAAC GACCGCGTTA 

2101 ATGCCATGGC GGCTGCCAAT GCAGGTTGGC AGGCCTATCA AACAGGCAAA 

2151 GGCGCACAAA ACTTAG 

This corresponds to the amino acid sequence <SEQ ID 522; ORF1 17ng-l>: 

1 LLVQTEKDGL HNEQTFGEKK VFSENGKLHN YWRARRKGHD ETGHREQNYT , 

51 LPEEITRDIS LGSFAYESHS KALSRHAPSQ GTELPQSNRD NIRTAKSNGI 

101 SLPYTPNSFT PLPGSSLYII NPANKGYLVE TDPRFANYRQ WLGSDYMLGS 

151 LKLDPNNLHK RLGDGYYEQR LINEQIAELT GHRRLDGYQN DEEQFKALMD 

201 NGATAARSMN LSVGIALSAE QAAQLTSDIV WLVQKEVKLP DGGTQTVLMP 

251 QVYVRVKNGG IDGKGALLSG SNTQINVSGS LKNSGTIAGR NALIINTDTL j 

301 DNIGGRIHAQ KSAVTATQDI NNIGGILSAE QTLLLNAGNN INNQSTAKSS 

351 QNAQGSSTYL DRMAGIYITG KEKGVLAAQA GKDINIIAGQ ISNQSDQGQT 

401 RLQAGRDINL DTVQTGKYQE IHFDADNHTI RGSTNEVGSS IQTKGDVTLL 

451 SGNNLNAKAA EVGSAKGTLA VYAKNDITIS SGIHAGQVDD ASKHTGRSGG 

501 GNKLVITDKA QSHHETAQSS TFEGKQWLQ AGNDANILGS NVISDNGTRI 

551 QAGNHVRIGT TQTQSQSETY HQTQKSGLMS AGIGFTIGSK TNTQENQSQS 

601 NEHTGSTVGS LKGDTTIVAS KHYEQTGSNV SSPEGNNLIS TQSMDIGAAQ 

651 NQLNSKTTQT YEQKGLTVAF SSPVTDLAQQ AIAVAHKAAN KSDKAKTTAL 

701 MPWRLPMQVG RPIKQAKAHK T* 

ORF117ng-l shows the same 90% identity over a 230aa overlap with ORF117. In addition, it 
shows homology with a secreted N.meningitidis protein in the database: 

gi 1 2623258 (AF030941) putative secreted protein [Neisseria meningitidis] Length = 
2273 

Score = 604 bits (1541), Expect = e-172 

Identities = 325/678 (47%), Positives - 449/678 (65%) , Gaps = 22/678 (3%) 

Query: 1 LLVQTEKDGLHNEQTFGEKKVFSENGKLHNYWRARRKGHDETGHREQNYTLPEE ITRDI S 60 

L+V T + L N++T GK + ++GLHYR +KG D TG+ Y E++ I 
Sbjct: 739 LIVGTPESALDNDETLGTKTI-TDKGDLHRYHRHHKKGRDSTGYSRSPYEPAPEVS-SIR 796 

Query: 61 LGSFAYESHSKALSRHAPSQGTELPQSNRDNIRTAKSNG I SLPYTPNSFT PLPGSSLYII 120 

+G AY+ + AP Q +++P + + NGI +T LP SSL+ I 

Sbjct: 797 MGISAYKGY APQQASDIPGTV VPWAENGIHPTFT LPNSSLFAI 840 

Query: 121 NPANKGYLVETDPRFANYRQWLGSDYMLGSLKLDPNNLHKRLGDGYYEQRLINEQIAELT 180 

P NKGYL+ETDP F +YR+WLGS YML +L+ DPN++HKRLGDGYYEQ-f L+NEQIA+LT 
Sbjct: 841 APNNKGYLIETDPAFTDYRKWLGSGYMLAALQQDPNHIHKRLGDGYYEQKLVNEQIAKLT 900 

Query: 181 GHRRLDGYQNDEEQFKALMDNGATAARSMNLSVGIALSAEQAAQLTSDIVWLVQKEVKLP 240 

G+RRLDGY NDEEQFKALMDNG T A+ + L+ GIALSAEQ A+LTSDIVWL + V LP 
Sbjct: 901 GYRRLDGYTNDEEQFKALMDNGITIAKELQLTPGIALSAEQVARLTSDIVWLENETVTLP 960 

Query: 241 EX^TQTVLMPQVYVRVICNGGIDGKGALLSGSNTQINVSGSLKN-SGTIAGRNALIINTDT 299 

DG TQTVL P+VYVR + ++G+GALLSGS I SG+++N G IAGR ALI+N 
Sbjct: 961 DGTTQTVLKPKVYVRARPKDMNGQGALLSGSVVDIG-SGAIENRGGLIAGREALILNAQN 1019 



Query: 300 



LDNIGGRIHAQKSAVTATQDINNIGGILSAEQTLLLNAGNNINNQSTAKSSQNAQGSSTY 359 
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+ N+G++ ADINGIAE LLL A NNI ++S +S+QN QGS 

Sbjct: 1020 IKNIiQGDLQGKNIFAAAGSDITNTGSI-GAENALLLKASNNIESRSETRSNQNEQGSVRN 1078 

Query: 360 LDI^GIYITGKEKGVLAAQAGKDINIIAGQISNQSDQGQTRLQAGRDINLDTVQTGKYQ 419 

+ R+AGIY+TG++ G + AG +1 + A +++NQS+ GQT L AG DI DT + Q 
Sbjct: 1079 IGRVAG IYLTGRQNGS VLLDAGNNIVLTASELTNQSEDGQTVLNAGGDIRS DTTGISRNQ 1138 

Query: 420 EIHFDADNHTIRGSTNEVGSSIQTKGDVTLLSGNNLNAKAAEVGSAKGTLAVYAKNDITI 479 

FD+DN+ IR NEVGS+I+T+G+++L + ++ +AAEVGS +G L + A DI + 
Sbjct: 1139 NT I FDS DNYVIRKEQNEVGST I RTRGNLSLNAKG DIRIRAAEVGSEQGRLKLAAGRDIKV 1198 

Query: 480 SSGIHAGQVDDASKHTGRSGGGNKLVITDKAQSHHETAQSSTFEGKQWLQAGNDANILG 539 

+G + +DA K+TGRSGGG K +T ++ + AST +GK+++L +G D + G 
Sbjct: 1199 EAGKAHTETEDALKYTGRSGGGIKQKMTRHLKNQNGQAVSGTLDGKEIILVSGRDITVTG 1258 

Query: 540 SNVISDNGTRIQAGNHVRIGTTQTQSQSETYHQTQKSGLM-SAGIGFTIGSKTOTQENQS 598 

SN+I+DN T + A N++ + +T+S+S ++ +KSGLM S GIGFT GSK +TQ N+S 
Sbjct: 1259 SNIIADNHTILSAKNNIVLKAAETRSRSAEMNKKEKSGLMGSGGIGFTAGSKKDTQTNRS 1318 

Query: 599 QSNEHTGSTVGSLKGDTTIVASKHYEQTGSNVSSPEGNNLISTQSMDIGAAQNQLNSKTT 658 

++ HT S VGSL G+T I A KHY QTGS +SSP+G+ IS+ + I AAQN+ + ++ 
Sbjct: 1319 ETVSHTESWGSLNGNTLISAGKHYTQTGSTISSPQGDVGISSGKISIDAAQNRYSQESK 1378 

Query: 659 QT YEQKGLTVAFS S PVTD 676 

Q YEQKG+TVA S PV + 
Sbjct: 1379 QV YEQKGVTVAI S VPWN 1396 

Based on this analysis, it is predicted that the proteins from N.meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 63 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 523>: 

1 ATGATTTACA TCGTACTGTT TCTAGCTGTC GTCCTCGCCG TTGTCGCCTA 

51 CAACATGTAT CAGGAAAACC AATACCGCAA AAAAGTGCGC GACCAGTTCG 

101 GACACTCCGA CAAAGATGCC CTGCTCAACA GCAwAACCAG CCATGTCCGC 

151 GACGGCAAAC CGTCCGGCGG GTCAGTCATG ATGCCGAAAC CCCAACCGGC 

201 GGTCAAAAAA ACGGCAAAAC CCCAAGACCC CGyCATGCGC AACCTGCAAG 

251 AACAGGATGC CGTCTACATC GCCAAGCAGA AACAGGCAAA AGCCTCCCCG 

301 TTCAAAACCG AAATCGAAAC CGCCTTGGAA GAAAGCGGCA TTATCGGCAA 

351 CTCCGCCCAC ACCGTTTCCG AACCCCAAAC CGGACATTCC GCAACGAAAC 

401 CTGCCGACGC GTCGGCAAAA CCTGCACCCG TTCCGCAAAC ACCTGCAAAA 

451 CCGCTGATTA CGCTCAAAGA ACTGTCAAAA GTCGAATTAT CCTGGTTTGA 

501 CGTGCGCATC GACTTCATCT CCTAT • . . 

This corresponds to the amino acid sequence <SEQ ID 524; ORF1 19>: 

1 MIYIVLFLAV VLAWAYNMY QENQYRKKVR DQFGHSDKDA LLNSXTSHVR 

51 DGKPSGGSVM MPKPQPAVKK TAKPQDPXMR NLQEQDAVYI AKQKQAKASP 

101 FKTEIETALE ESGIIGNSAH TVSEPQTGHS ATKPADASAK PAPVPQTPAK 

151 PLITLKELSK VELSWFDVRI DFISY. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 525>: 

1 ATGATTTACA TCGTACTGTT TCTAGCTGTC GTCCTCGCCG TTGTCGCCTA 

51 CAACATGTAT CAGGAAAACC AATACCGCAA AAAAGTGCGC GACCAGTTCG 

101 GACACTCCGA CAAAGATGCC CTGCTCAACA GCAAAACCAG CCATGTCCGC 

151 GACGGCAAAC CGTCCGGCGG GTCAGTCATG ATGCCGAAAC CCCAACCGGC 

201 GGTCAAAAAA ACGGCAAAAC CCCAAGACCC CGCCATGCGC AACCTGCAAG 

251 AACAGGATGC CGTCTACATC GCCAAGCAGA AACAGGCAAA AGCCTCCCCG 

301 TTCAAAACCG AAATCGAAAC CGCCTTGGAA GAAAGCGGCA TTATCGGCAA 

351 CTCCGCCCAC ACCGTTTCCG AACCCCAAAC CGGACATTCC GCACCGAAAC 

401 CTGCCGACGC GCCGGCAAAA CCTGCACCCG TTCCGCAAAC ACCTGCAAAA 

451 CCGCTGATTA CGCTCAAAGA ACTGTCAAAA GTCGAATTAC CCTGGTTTGA 

501 CGTGCGCTTC GACTTCATCT CCTATATCGC GCTGACCGAA GCCAAAGAAC 

551 TGCACGCACT GCCGCGCCTT TCCAACCGCT GCCGCTACCA GATTGTCGGC 

601 TGCACCATGG ACGACCATTT CCAGATTGCC GAACCCATCC CGGGCATCCG 
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651 CTATCAGGCA TTTATCGTGG GTATTCAGGC AGTCAGCCGC AACGGACTTG 

701 CCTCGCAGGA AGAACTCTCC GCATTCAACC GCCAGGTGGA CGCATTCGCA 

751 CAAAGCATGG GCGGTCAGAC GCTGCACACC GACCTTGCCG CCTTTATCGA 

801 AGTGGCTTCC GCACTGGACG CATTCTGCGC GCGCGTCGAC CAGACCATCG 

851 CCATCCATTT GGTTTCCCCG ACCAGCATCA GCGGCGTAGA ACTGCGTTCC 

901 GCCGTAACGG GCGTGGGTTT CGTTTTGGAA GACGACGGCG CGTTCCACTA 

951 TACCGACACG TCGGGCTCGA CCATGTTCTC CATCTGCTCG CTCAACAACG 

1001 AGCCGTTTAC CAACGCCCTT TTGGACAACC AGTCCTACAA AGGCTTCAGT 

1051 ATGCTGCTCG ACATCCCGCA CTCTCCGGCA GGCGAAAAAA CCTTCGACGA 

1101 TTTGTTTATG GATTTGGCGG TACGCCTGTC CGGCCAGTTG AACCTGAATC 

1151 TGGTCAACGA CAAAATGGAA GAAGTTTCGA CCCAATGGCT CAAAGACGTG 

1201 CGCACTTATG TATTGGCGCG TCAGTCCGAG ATGCTCAAAG TCGGTATCGA 

1251 ACCGGGCGGC AAAACCGCAT TGCGCCTGTT CTCCTAA 

This corresponds to the amino acid sequence <SEQ ID 526; ORF119-l>: 

1 MIYIVLFLAV VLAWA YNMY QENQYRKKVR DQFGHSDKDA LLNSKTSHVR 

51 DGKPSGGSVM MPKPQPAVKK TAKPQDPAMR NLQEQDAVYI AKQKQAKASP 

101 FKTEIETALE ESGIIGNSAH TVSEPQTGHS APKPADAPAK PAPVPQTPAK 

151 PLITLKELSK VELPWFDVRF DFISYIALTE AKELHALPRL SNRCRYQIVG 

201 CTMDDHFQIA EPIPGIRYQA FIVGIQAVSR NGLASQEELS AFNRQVDAFA 

251 QSMGGQTLHT DLAAFIEVAS ALDAFCARVD QTIAIHLVSP TSISGVELRS 

301 AVTGVGFVLE DDGAFHYTDT SGSTMFSICS LNNEPFTNAL LDNQSYKGFS 

351 MLLDIPHSPA GEKTFDDLFM DLAVRLSGQL NLNLVNDKME EVSTQWLKDV 

401 RTYVLARQSE MLKVGIEPGG KTALRLFS* 

Computer analysis of this amino acid sequence gave the following results: ! 
Homology with a predicted ORF from Mmeninzitidis (strain A) 

ORF1 19 shows 93.7% identity over a 175aa overlap with an ORF (ORF1 19a) from strain A of 
meningitidis: 

10 20 30 40 50 60 

or f 1 1 9 . pep MI YIVLFLAWLAWAYNMYQENQYRKKVRDQFGHS DKDALLNSXT SHVR DGKPSGGSVM 
I I I I I Ml 1:1 I III I I I I Ml MM I I I II M I I I I I [ M I I I MMMMMM I ! 
orfll9a MIYIVLFLAAVLAVVAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGPVM 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 119 . pep MPK PQPAVKKTAKPQDPXMRNLQEQDAVY IAKQKQAKAS P FKTE I ETALEE SGI I GNS AH 
MMMMMMI Ml M M M M M M M M M M M M M M M M M M I M ! M 
orf 119a MPKPQPAVKKTAKSQDPAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGIIGNSAH 

70 80 90 100 110 120 

130 140 150 160 170 

orf 11 9 . pep TVSEPQTGHSATKPADASAKPAPVPQTPAKPLITLKELSKVELSWFDVRIDFISY 
M MMMM MM I M I : I M I M M M M M M I M M M M 1 : 1 1 I I I 

orf 11 9a TVPEPQTGHSAPKPADAPAKPVPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 
130 140 150 160 170 180 

orf 119a AKELHALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLASQEELS 
190 200 210 220 230 240 

The complete length ORF1 19a nucleotide sequence <SEQ ID 527> is: 

1 ATGATTTACA TCGTACTGTT CCTCGCCGCC GTCCTCGCCG TTGTCGCCTA 

51 CAATATGTAT CAGGAAAACC AATACCGCAA AAAAGTGCGC GACCAGTTCG 

101 GGCACTCCGA CAAAGATGCC CTGCTCAACA GCAAAACCAG CCATGTCCGC 

151 GACGGCAAAC CGTCCGGCGG GCCAGTCATG ATGCCGAAAC CCCAACCGGC 

201 GGTCAAAAAA ACGGCAAAAT CCCAAGACCC CGCCATGCGC AACCTGCAAG 

251 AGCAGGATGC CGTCTACATC GCCAAGCAGA AACAGGCAAA AGCCTCCCCG 

301 TTCAAAACCG AAATCGAAAC CGCCTTGGAA GAAAGCGGCA TTATCGGCAA 

351 CTCCGCCCAC ACCGTTCCCG AACCCCAAAC CGGACATTCC GCACCAAAAC 

401 CTGCCGACGC GCCGGCAAAA CCTGTTCCCG TTCCGCAAAC GCCGGCAAAA 

451 CCGCTGATTA CGCTCAAAGA GCTGTCGAAG GTCGAGCTGC CCTGGTTTGA 

501 CGTGOGCTTC GACTTCATCT CTTATATCGC GCTGACCGAA GCCAAAGAAC 

551 TGCACGCACT GCCGCGCCTT TCCAACCGCT GCCGCTACCA GATTGTCGGC 

601 TGCACCATGG ACGACCATTT CCAGATTGCC GAACCCATCC CGGGCATCCG 
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651 CTATCAGGCA TTTATCGTGG GTATTCAGGC AGTCAGCCGC AACGGACTTG 

701 CCTCGCAGGA AGAACTCTCC GCATTCAACC GCCAGGTGGA TGCATTCGCA 

751 CACAGCATGG GCGGTCAGAC GCTGCACACC GACCTTGCCG CCTTTATCGA 

801 AGTGGCTTCC GCACTGGACG CATTCTGCGC GCGCGTCGAC CAGACTATCG 

851 CCATCCATTT GGTTTCCCCG ACCAGCATCA GCGGCGTAGA ACTGCGTTCC 

901 GCCGTAACGG GCGTGGGTTT CGTTTTGGAA GACGACGGCG CGTTCCACTA 

951 TACCGACACG TCGGGCTCGA CCATGTTCTC CATCTGCTCG CTCAACAACG 

1001 AGCCGTTTAC CAATGCCCTT TTGGACAACC AGTCCTATAA AGGCTTCAGT 

1051 ATGCTGCTCG ACATCCCGCA CTCTCCGGCA GGCGAAAAAA CCTTCGACGA 

1101 TTTGTTTATG GATTTGGCGG TACGCCTGTC CGGCCAGTTG AACCTGAATC 

1151 TGGTCAACGA CAAAATGGAA GAAGTTTCGA CCCAATGGCT CAAAGACGTG 

1201 CGCACTTATG TATTGGCTCG TCAGTCCGAG ATGCTCAAAG TCGGTATCGA 

1251 ACCGGGCGGC AAAACCGCAT TGCGCCTGTT CTCCTAA 

This encodes a protein having amino acid sequence <SEQ ID 528>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 



MIYIVLFLAA VLAWAYNMY 



DGKPSGGPVM 
FKTEIETALE 
PLITLKELSK 
CTMDDHFQIA 
HSMGGQTLHT 
AVTGVGFVLE 
MLLDIPHSPA 
RTYVLARQSE 



MPKPQPAVKK 
ESGIIGNSAH 
VELPWFDVRF 
EPIPGIRYQA 
DLAAFIEVAS 
DDGAFHYTDT 
GEKTFDDLFM 
MLKVGIEPGG 



QENQYRKKVR 
TAKSQDPAMR 
TVPEPQTGHS 
DFISYIALTE 
FIVGIQAVSR 
ALDAFCARVD 
SGSTMFSICS 
DLAVRLSGQL 
KTALRLFS* 



DQFGHSDKDA 
NLQEQDAVYI 
APKPADAPAK 
AKELHALPRL 
NGLASQEELS 
QTIAIHLVSP 
LNNEPFTNAL 
NLNLVNDKME 



LLNSKTSHVR 
AKQKQAKASP 
PVPVPQTPAK 
SNRCRYQIVG 
AFNRQVDAFA 
TSISGVELRS 
LDNQSYKGFS 
EVSTQWLKDV 



ORF1 19a and ORF1 19-1 show 98.6% identity in 428 aa overlap: 



10 20 30 40 50 60 

orf 1 19a. pep MIYIVLFLAAVLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGPVM 
I f 1 I I 1 I I I : I I I I I 1 I I 1 I t I I I I I I I I t i I 1 I i I i I I I I I I I I I 1 I I I I I I I I 1 t il 
orf 119-1 MI Y I VL FLAWLAWAYNMYQENQ YRKKVRDQFGHS DKDALLN SKT SH VRDGKP SGG S VM 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 119a . pep MPKPQPAVKKTAKSQDPAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGIIGNSAH 
llllilllillll II II llll 111 Hill Mill II I I I I I 1 M I I M I Ml I M I M I 
orf 119-1 MPKPQPAVKKTAKPQDPAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGIIGNSAH 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 11 9a . pep TVPEPQTGHSAPKPADAPAKPVPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 
II 1 I I 1 f I I I ! I I I I 1 I I I I : I I 1 1 I I 1 1 I I I I I I f I I I I I I I 1 I I I i i I I I 1 I i I I i I 
orf 1 19-1 TVSEPQTGHSAPKPADAPAKPAPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 119a. pep AKELHALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLASQEELS 
I I I i I I 1 II I I I I I I I II I III Mill I II I II IMMI MMM Ml III Mill I II I 
orf 119-1 AKELHALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLASQEELS 

190 200 210 220 230 240 



250 260 270 280 290 300 

orf 119a. pep AFNRQVDAFAHSMGGQTLHT DLAAFIEVAS ALDAFCARVDQT I AIHLVS PT S I SGVELRS 
I I I I I I I I I I : II I I I I II I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 
or f 1 1 9- 1 AFNRQVDAFAQSMGGQT LHTDLAAFIEVASALDAFCARVDQT I AIHLVS PTS I SGVELRS 

250 260 270 280 290 300 



310 320 330 340 350 360 

orf 11 9a . pep AVTGVGFVLEDDGAFHYTDTSGSTMFS ICSLNNEPFTNALLDNQSYKGFSMLLDI PHSPA 
IIIIIM MMIM II II II MMIMMM Mill llll IMMI Ml MM I IMM I 
orf 119-1 AVTGVGFVLEDDGAFHYTDTSGSTMFS ICSLNNEPFTNALLDNQSYKGFSMLLDIPHS PA 

310 320 330 340 350 360 



370 380 390 400 410 420 

or f 11 9a . pep GEKTFDDLFMDLAVRLSGQLNLNLVNDKMEEVSTQWLKDVRTYVLARQSEMLKVGIEPGG 
III I IMMI Mil II MM II I ill Mill MM II I I I 1 I I I I I I I 1 II III MMM 
orf 119-1 GEKTFDDLFMDLAVRLSGQLNLNLVNDKMEEVSTQWLKDVRTYVLARQSEMLKVGIEPGG 

370 380 390 400 410 420 



429 
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orf 119a. pep KTALRLFSX 
I I I I I I I I I 
orfll9-l KTALRLFSX 

Homology with a predicted ORF from ^gonorrhoeae 

ORF119 shows 93.1% identity over a 175aa overlap with a predicted ORF (ORF119ng) from 
N. gonorrhoeae: 

orf 119 . pep MIYIVLFLAVVIAWAYNMYQENQYRKKTODQFGHSDKDAL^ 60 

i 1 t 1 1 1 I t ! * t I 1 I I I I t I t I I t I I I I I 1 1 1 1 1 I I I I f I I 1 I | t MIMIIIIMI II 
0rfll9ng MI Y I VLFLAAVLAWAYNMYQENQYRKKVRDQFGHS DKDALLN SKT SHVRDGKPSGGPVM 60 

orf 119 . pep MPKPQPAVKKTAKPQD PXMRNLQEQDAVYI AKQKQAKAS PFKTE I ETALEESGI IGNSAH 120 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I |'| | | | | | | | | | 
orfll9ng MPKPQPAVKKPAKPQDSAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEEIGI IGNSAH 120 



orf 119 . pep TVSEPQTGHSATKPADASAKPAPVPQTPAKPLITLKELSKVELSWFDVRIDFISY 175 

ilMMMIM I I I I I I i I: 1 Ml II III III M I Ml III MMMIIMI 
orfll9ng TVSEPQTGHSAPKPADAPAKPVPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 180 

The complete length ORF1 19ng nucleotide sequence <SEQ ID 529> is: 



1 


ATGATTTACA 


51 


CAATATGTAT 


101 


GACACTCCGA 


151 


GACGGCAAAC 


201 


GGTCAAAAAA 


251 


AACAGGATGC 


301 


TTCAAAACCG 


351 


CTCCGCCCAC 


401 


CTGCCGACGC 


451 


CCGCTGATTA 


501 


CGTGCGCTtc 


551 


TGCACGCACT 


601 


TGCACCATGG 


651 


CTATCAGGCA 


701 


CCTCGCAGGA 


751 


CAAAGCATGG 


801 


AGTGGCTTCC 


851 


CCATCCATTT 


901 


GCCGTAACGG 


951 


TACCGACACG 


1001 


AGCCGTTTAC 


1051 


ATGCTGCTCG 


1101 


TTTGTTTATG 


1151 


TGGTCAACGA 


1201 


CGCACTTATG 


1251 


ACCGGGCGGC 



TCGTACTGTT 
CAGGAAAACC 
CAAAGATGCC 
CGTCCGGCGG 
CCGGCCAAAC 
CGTCTACATC 
AAATCGAAAC 
ACCGTTTCCG 
GCCGGCAAAA 
CGCTCAAAGA 
gACTTCATCT 
GCCGCGCCTT 
ACGACCATTT 
TTTATCGTGG 
AGAACTCTCC 
GCGGTCAGAC 
GCACTGGACG 
GGTTTCGCCG 
GCGTGGGTTT 
TCGGGCTCGA 
CAATGCCCTT 
ACATCCCGCA 
GATTTGGCGG 
CAAAATGGAA 
TATTGGCGCG 
AAAACCGCCC 



CCTCGCCGCC 
AATACCGCAA 
CTGCTCAACA 
GCCAGTCATG 
CCCAAGACTC 
GCCAAGCAGA 
CGCCTTGGAA 
AACCCCAAAC 
CCCGTTCCCG 
GCTGTCGAAG 
CCTATATCGC 
tccAACCGCT 
CCAGATTGCC 
GTATCCAGGC 
GCATTCAACC 
GCTGCACACC 
CATTCTGCGC 
ACCAGCATCA 
CGTTTTGGAA 
CCATGTTCTC 
TTGGACAACC 
CTCTCCGGCA 
TACGCCTGTC 
GAAGTTTCGA 
TCAGTCCGAG 
TGCGCCTGTT 



GTCCTCGCCG 
AAAAGTGCGC 
GCAAAACCAG 
ATGCCGAAAC 
CGCCATGCGC 
AACAGGCAAA 
GAAATCGGCA 
CGGACATTCC 
TTCCGCAAAC 
GTCGAGCTGC 
GCTGACCGAA 
GCCGCTACCA 
GAACCCATCC 
AGTCAGCCGC 
GCCAGGCGGA 
GACCTTGCCG 
GCGCGTCGAC 
GCGGCGTAGA 
GACGACGGCG 
CATCTGCTCG 
AGTCCTACAA 
GGCGAAAAAA 
CGGTCAGTTG 
CCCAATGGCT 
ATGCTCAAAG 
TTCATAA 



TTGTCGCCTA 
GACCAGTTCG 
CCATGTCCGC 
CCCAACCGGC 
AACCTGCAAG 
AGCCTCCCCG 
TTATCGGCAA 
GCACCGAAAC 
GCCGGCAAAA 
CCTGGTTTGA 
GCCAAAGAAC 
GATTGTCGGC 
CGGGCATCCG 
AACGGACTTG 
CGCATTCGCA 
CCTTTATCGA 
CAGACCATCG 
ACTGCGTTCC 
CGTTCCACTA 
CTCAACAACG 
AGGCTTCAGT 
CCTTCGACGA 
AACCTGAATC 
CAAAGACGTA 
TCGGTATCGA 



This encodes a protein having amino acid sequence <SEQ ID 530>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 



MIYIVLFLAA VLAWAYNMY 



DGKPSGGPVM 
FKTEIETALE 
PLITLKELSK 
CTMDDHFQIA 
QSMGGQTLHT 
AVTGVGFVLE 
MLLDIPHSPA 
RTYVLARQSE 



MPKPQPAVKK 
EIGIIGNSAH 
VELPWFDVRF 
EPIPGIRYQA 
DLAAFIEVAS 
DDGAFHYTDT 
GEKTFDDLFM 
MLKVGIEPGG 



QENQYRKKVR 
PAKPQDSAMR 
TVSEPQTGHS 
DFISYIALTE 
FIVGIQAVSR 
ALDAFCARVD 
SGSTMFSICS 
DLAVRLSGQL 
KTALRLFS* 



DQFGHSDKDA 
NLQEQDAVYI 
APKPADAPAK 
AKELHALPRL 
NGLASQEELS 
QTIAIHLVSP 
LNNEPFTNAL 
NLNLVNDKME 



LLNSKTSHVR 
AKQKQAKAS P 
PVPVPQTPAK 
SNRCRYQIVG 
AFNRQADAFA 
TSISGVELRS 
LDNQSYKGFS 
EVSTQWLKDV 



ORF1 19ng and ORF1 19-1 show 98.4% identity over 428 aa overlap: 



10 20 30 40 50 60 

orfll9ng MIYIVLFLAAVIAWAYNMYQENQYRKKVREXJFGHSDKDALLNSKTSHVRIX3KPSGGPVM 
M I I II I I I : ! I II I I I M I | | I | I I I | I | I I I I I I I I || 1 | | | 1 | | i | || | | | | | | |( 
orf 119-1 MIYIVLFLAVVLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGSVM 

10 20 30 40 50 60 
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orfll9ng 
orfll9-l 

orfU9ng 
orfll9-l 

orfll9ng 
orfll9-l 

orfll9ng 
orfll9-l 

orfll9ng 
orfll9-l 

orfll9ng 
orfll9-l 

orfll9ng 
orfll9-l 



70 80 90 100 110 120 

MPKPOPAVKKPAKPQDSAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEEIGIIGNSAH 

Ml II I I IN Mill II I III Mill I IIMII II M Ml MM Mill II Mil II 
MPKPQPAVKKTAKPQDPAMRNLQEQDAVYI AKQKQAKAS PFKTE IETALEE SG I IGNSAH 
80 90 100 110 120 



70 

130 140 150 160 170 180 

TVSEPQTGHSAPKPADAPAKPVPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 
I I I 1 1 M I M M I M M 1 1 II M M I M II M M I I I I M M I M M M II I M M I I M 
TVSEPQTGHSAPKPADAPAKPAPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 

130 140 150 160 170 180 



190 200 210 220 230 240 

AKELHALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLASQEELS 

I 1 1 1 1 1 M II M M I II II I i M 1 1 I M M M 11 11 II II II I i M II I I M M I 1 1 M I 
AKELHALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLASQEELS 
190 200 210 220 230 240 

250 260 270 280 290 300 

AFNRQADAFAQSMGGQTLHT DLAAFIE VAS ALDAFCARVDQTI AIHLVS PTS I SGVELRS 
MMMIMMIMMMMMMI IIMIMMM II I II 1 1 II II I M II II II M I I 
AFNRQVDAFAQSMGGQTLHTDLAAFIEVASALDAFCARVDQTIAIHLVSPTSISGVELRS 

250 260 270 280 290 300 

310 320 330 340 350 360 

AVTGVGFVLEDDGAFHYTDTSGSTMFSICSLNNEPFTNALLDNQSYKGFSMLLDIPHSPA 

M I II II II II M II I II 1 1 II I M I II I I I I M I M II I M M I II I I I II II M M M 
AVTGVGFVLE DDGAFHYTDTSGSTMFS I CS LNNEP FTNALLDNQS YKGFSMLLDI PHS PA 
310 320 330 340 350 360 

370 380 390 400 410 420 

GEKTFDDLFMDLAVRLSGQLNLNLVNDKMEEVSTQWLKDVRTYVLARQSEMLKVGIEPGG 

M M 1 1 I II I M M I M II M I II M II II I 1 1 M II M II M M I M I II II I M M II 
GEKTFDDLFMDLAVRLSGQLNLNLVNDKMEEVSTQWLKDVRTYVLARQSEMLKVGIEPGG 

370 380 390 400 410 420 

429 

KTALRLFSX 

II I II I I II 
KTALRLFSX 



Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it is predicted that the proteins from Mmeningitidis and N. gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 64 



The following partial DNA sequence was identified in Mmeningitidis <SEQ ID 531> 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 



. GCGCGGCACG 
GCAGATAGTC 
TCGCCCTGAT 
CTGGTGTCCG 
CGGCGCGCGG 
TAATCTGCGT 
AGCCTCGTGT 
CATGTCCGTC 
TCGGCTTTAT 
TTGGCACAGG 



GCACGGAAGA 
GAAAGCACCA 
TTCATTGGTA 
TTACCGAGCG 
CGCGGCAATA 
CATCGGCGGT 
TCAATCATTT 
ATCGGCGCGG 
GCCTGCCAAT 
ATTGA 



TTTCTTCATG 
CCGGTACGAT 
GTCGGCGGCA 
CACCAAAGAA 
TTTyGCAGCA 
TTGGTCGGCG 
TGTAACCGAC 
TCGCCTGTTC 
AAAGCAGCCA 



AACAACAGCG 
GAAGCTGCTG 
TCGGCGTGAT 
ATCGGCATAC 
GTTTTTGATT 
TGGGTTTGTC 
TTCCCGATGG 
GACCGGAATC 
AACTCAATCC 



ACAC . ATCAG 
ATTTCCTCCA 
GAACATCATG 
GGATGGCAAT 
GAGGCGGTGT 
CGCCGCCGTC 
ACATTTCCGC 
GGCATCGCGT 
GATAGACGCA 



This corresponds to the amino acid sequence <SEQ ID 532; ORF134>: 

1 ARHGTEDFFM NNSDXIRQIV ESTTGTMKLL ISSIALISLV VGGIGVMNIM 

51 LVSVTERTKE IGIRMAIGAR RGNIXQQFLI EAVLICVIGG LVGVGLSAAV 

101 SLVFNHFVTD FPMDISAMSV IGAVACSTGI GIAFGFMPAN KAAKLNPIDA 

151 LAQD* 
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Further work revealed the complete nucleotide sequence <SEQ ID 533>: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 



ATGTCGGTGC 
GCTCGGCATC 
GCAATGGTTC 
AACACCATCA 
CAGGATTAAA 
GCTACGTTGC 
TACCGCAACA 
TTTCGACGTG 
ACGATGTGAA 
GACAAACTCT 
GAAACGCCCC 
TCGGCAATTC 
CACCAAATCA 
AGACAATGCC 
AAGCGCGGCA 
AGGCAGATAG 
CATCGCCCTG 
TGCTGGTGTC 
ATCGGCGCGC 
GTTAATCTGC 
TCAGCCTCGT 
GCCATGTCCG 
GTTCGGCTTT 
CATTGGCACA 



AAGCAGTATT 
ATCATCGGTA 
GCAGAAAAAA 
GCATCTTCCC 
ACCCTGACCA 
TTCCGCCACG 
CCGACCTGAC 
CGCGGACTGA 
AGAAGACGCG 
TTGCGGACTC 
TTGACCGTCA 
CGACGTGCTG 
CAGGCGAGAG 
AATACCCAGG 
CGGCACGGAA 
TCGAAAGCAC 
ATTTCATTGG 
CGTTACCGAG 
GGCGCGGCAA 
GTCATCGGCG 
GTTCAATCAT 
TCATCGGCGC 
ATGCCTGCCA 
GGATTGA 



GGCGCACAAA 
TCGCGTCGGT 
ATCCTTGAAG 
GGGGCGCGGC 
TAGACGACGC 
CCCATGACTT 
CGCCTCGCTT 
AGCTGGAAAC 
CAGGTCGTCG 
GGATCCGTTG 
TCGGCGTGAT 
ATGCTTTGGT 
CCACACCAAC 
TTGCCGAAAA 
GATTTCTTCA 
CACCGGTACG 
TAGTCGGCGG 
CGCACCAAAG 
TATTTTGCAG 
GTTTGGTCGG 
TTTGTAACCG 
GGTCGCCTGT 
ATAAAGCAGC 



ATGCGTTCGC 
GGTTTCCGTC 
ACATCAGTTC 
TTCGGCGACA 
AAAAATCATC 
CGAGCGGCGG 
TACGGCGTGG 
GGGGCGGCTG 
TCATCGACCA 
GGTAAAACCA 
GAAAAAAGAC 
CGCCCTATAC 
TCCATCACCG 
AGGGCTGACC 
TGAACAACAG 
ATGAAGCTGC 
CATCGGCGTG 
AAATCGGCAT 
CAGTTTTTGA 
CGTGGGTTTG 
ACTTCCCGAT 
TCGACCGGAA 
CAAACTCAAT 



TTCTGACGAT 
GTCGCATTGG 
GATAGGGACG 
GGCGCAGCGG 
GCCAAACAAA 
CACGCTGACT 
GCGAACAATA 
TTTGACGAAA 
AAATGTCAAA 
TTTTGTTCAG 
GAAAACGCTT 
GACGGTGATG 
TCAAAATCAA 
GATCTGCTCA 
CGACAGCATC 
TGATTTCCTC 
ATGAACATCA 
ACGGATGGCA 
TTGAGGCGGT 
TCCGCCGCCG 
GGACATTTCC 
TCGGCATCGC 
CCGATAGACG- 



This corresponds to the amino acid sequence <SEQ ID 534; ORF134-l>: 



l 

51 
101 
151 
201 
251 
301 
351 



MSVQAVLAHK MRSLLTMLGI IIGIASWSV VALGN GSOKK 



NTISIFPGRG 
YRNTDLTASL 
DKLFADSDPL 
HQITGESHTN 
RQIVESTTGT 
IGARRGNILQ 
AMSVIGAVAC 



FGDRRSGRIK TLTIDDAKII 
YGVGEQYFDV RGLKLETGRL 
GKTILFRKRP LTVIGVMKKD 
SITVKIKDNA NTQVAEKGLT 
MK LLISSIAL ISLWGGIGV 
QFLIEAVLIC VIGGLVGV GL 
STGIGIAFGF MPANKAAKLN 



AKQSYVASAT 
FDENDVKEDA 
ENAFGKSDVL 
DLLKARHGTE 
MN IMLVS VTE 
SAAVSLVFNH 
PIDALAQD* 



ILEDISSIGT 
PMTSSGGTLT 
QVWIDQNVK 
MLWSPYTTVM 
DFFMNNSDSI 
RTKEIGIRMA 
FVTDFPMDIS 



Computer analysis of this amino acid sequence gave the following results: 

Homology with the hypothetical protein o648 of Exoli (accession number AE0001 89) 
ORF134 and o648 protein show 45% aa identity in 153aa overlap: 

0rfl34: 2 RHGTE DFFMNN S DX I RQI VE STTGTMKXXXXXXXXXXXWGG IGVMN IMLVS VTERTKE I 61 

RHG +DFF N D + + VE TT T++ WGG IGVMN IMLVS VTERT+E I 

o648: 496 RHGKKDFFTWNMDGVLKTVEKTTRTLQLFLTLVAVISLVVGGIGVMNIMLVSVTERTREI 555 

0rfl34: 62 GIRMAIGARRGNIXQQFLIEAXXXXXXXXXXXXXXXXXXXXXFNHFVTDFPMDISAMSVI 121 

GIRMA+GAR ++ QQFLIEA F+ + + S ++++ 

o648: 556 GIRMAVGARASDVLQQFLIEAVLVCLVGGALGITLSIXIAFTLQLFLPGWEIGFSPIALL 615 

0rfl34: 122 GAVACSTGIGIAFGFMPANKAAKLN PIDALAQD 154 

A CST GI FG++PA AA+L+ P+ DALA++ 
o648: 616 LAFLCSTVTGILFGWLPARNAARLDPVDALARE 648 

Homology with a predicted ORF from N.menineitidis (strain A) 

ORF134 shows 98.7% identity over a 154aa overlap with an ORF (ORF134a) from strain A oiK 
meningitidis: 

10 20 30 

orf 134 .pep ARHGTEDFFMNNSDXIRQIVESTTGTMKLL 

I MM II Ml II II MINIMI HI Ml 
orf 134a GESHTNSITVKIKDNANTQVAEKGLTDLLKARHGTEDFFMNNSDSIRQIVESTTGTMKLL 
210 220 230 240 250 260 



40 



50 



60 



70 



80 



90 
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orf 134 . pep I S S I AL I S LWGG I GVMN IMLVS VTERTKE I G IRMAI GARRGN IXQQFL I EAVL I CVI GG 
I I I I I I I -II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I II I II I 
orf 134a ISSIALISLVVGGIGVMNIMLVSVTERTKEIGIRMAIGARRGNILQQFLIEAVLICVIGG 
270 280 290 300 310 320 

100 110 120 130 140 150 

or f 134 . pep LVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVACSTGIGIAFGFMPANKAAKLNPIDA 
I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 
orf 134a LVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVACSTGIGIAFGFMPANKAAKLNPIDA 
330 340 350 360 370 380 



orf 134. pep LAQDX 
II I I I 

orfl34a LAQDX 

The complete length ORF134a nucleotide sequence <SEQ ID 535> is: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 



ATGTCGGTGC 
GCTCGGCATC 
GCAACGGTTC 
AACACCATCA 
CAGGATTAAA 
GCTACGTTGC 
TACCGCAATA 
TTTCGACGTG 
ACGATGTGAA 
GACAAACTCT 
GAAACGCCCC 
TCGGCAATTC 
CACCAAATCA 
AGACAATGCC 
AAGCGCGGCA 
AGGCAGATAG 
CATCGCCCTG 
TGCTGGTGTC 
ATCGGCGCGC 
GTTAATCTGC 
TCAGCCTCGT 
GCCATGTCCG 
GTTCGGCTTT 
CATTGGCGCA 



AAGCAGTATT 
ATCATCGGTA 
GCAGAAAAAA 
GCATCTTCCC 
ACCCTGACCA 
TTCCGCCACG 
CCGACCTGAC 
CGCGGGCTGA 
AGAAGACGCG 
TTGCGGACTC 
TTGACCGTCA 
CGACGTGCTG 
CAGGCGAGAG 
AATACCCAGG 
CGGCACGGAA 
TCGAAAGCAC 
ATTTCATTGG 
CGTTACCGAG 
GGCGCGGCAA 
GTCATCGGCG 
GTTCAATCAT 
TCATCGGCGC 
ATGCCTGCCA 
GGATTGA 



GGCGCACAAA 
TCGCTTCGGT 
ATCCTTGAAG 
AGGGCGCGGC 
TAGACGACGC 
CCCATGACTT 
CGCTTCTTTG 
AGCTGGAAAC 
CAGGTCGTCG 
GGATCCGTTG 
TCGGCGTGAT 
ATGCTTTGGT 
CCACACCAAC 
TTGCCGAAAA 
GATTTCTTCA 
CACCGGTACG 
TAGTCGGCGG 
CGCACCAAAG 
TATTTTGCAG 
GTTTGGTCGG 
TTTGTAACCG 
GGTCGCCTGT 
ATAAAGCAGC 



ATGCGTTCGC 
TGTCTCCGTC 
ACATCAGTTC 
TTCGGCGACA 
AAAAATCATC 
CGAGCGGCGG 
TACGGTGTGG 
GGGGCGGCTG 
TCATCGACCA 
GGTAAAACCA 
GAAAAAAGAC 
CGCCCTATAC 
TCCATCACCG 
AGGGCTGACC 
TGAACAACAG 
ATGAAGCTGC 
CATCGGCGTG 
AAATCGGCAT 
CAGTTTTTGA 
CGTGGGTTTG 
ACTTCCCGAT 
TCGACCGGAA 
CAAACTCAAT 



TTCTGACGAT 
GTCGCATTGG 
GATAGGGACG 
GGCGCAGCGG 
GCCAAACAAA 
CACGCTGACT 
GCGAACAATA 
TTTGACGAAA 
AAATGTCAAA 
TTTTGTTCAG 
GAAAACGCTT 
GACGGTGATG 
TCAAAATCAA 
GATCTGCTCA 
CGACAGCATC 
TGATTTCCTC 
ATGAACATCA 
ACGGATGGCA 
TTGAGGCGGT 
TCCGCCGCCG 
GGACATTTCC 
TCGGCATCGC 
CCGATAGATG 



This encodes a protein having amino acid sequence <SEQ ID 536>: 



1 MSVQAVLAHK MRSLLTMLGI IIGIASWSV VALGN GSQKK ILEDISSIGT 

51 NTISIFPGRG FGDRRSGRIK TLTIDDAKII AKQSYVASAT PMTSSGGTLT 

101 YRNTDLTASL YGVGEQYFDV RGLKLETGRL FDENDVKEDA QVWIDQNVK 

151 DKLFADSDPL GKTILFRKRP LTVIGVMKKD ENAFGNSDVL MLWSPYTTVM 

201 HQITGESHTN SITVKIKDNA NTQVAEKGLT DLLKARHGTE DFFMNNSDSI 

251 RQIVESTTGT MKL LISSIAL ISLWGGIGV MNIMLVSVTE RTKEIGIRMA 

301 I GARRGN I LQ QFLIEAVLIC VIGGLVGV GL SAAVSLVFNH FVTDFPMDIS 

351 AMS VIGAVAC STGIGIAFGF MPANKAAKLN PIDALAQD* 

ORF134a and ORF134-1 show 100.0% identity in 388 aa overlap: 



orf 134a . pep MS VQAVLAHKMRS LLTMLG 1 1 IG I AS WS WALGNGSQKKI LEDI S S IGTNT I SI FPGRG 
1 1 1 1 I i 1 1 1 i I 1 1 I 1 1 1 1 I I 1 1 1 I I 1 1 1 1 I I ! 1 1 1 f I i I I I t I t I I I I I I i 1 1 1 I 1 I 1 1 I 
orf 134-1 MS VQAVLAHKMRS LLTMLG I I IG I AS WS WALGNGSQKKI LEDI SS IGTNTI SI FPGRG 



orf 134a . pep FGDRRSGRIKTLTIDDAKIIAKQSYVASATPMTSSGGTLTYRNTDLTASLYGVGEQYFDV 
IMIIIIIIIIIIIIilllllllllllllllMIIMIIIIIIMllMIIIIIMIIII 
orf 134-1 FGDRRSGRIKTLTI DDAKI IAKQSYVASATPMTSSGGTLTYRNTDLTASLYGVGEQYFDV 

orf 134a . pep RGLKLETGRLFDENDVKEDAQVVVIDQNVKDECLFADSDPLGKTILFRKRPLTVIGVMKKD 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 r 1 1 1 1 1 1 1 1 1 f i 

orf 134-1 RGLKLETGRLFDENDVKEDAQVWIDQNVKDKLFADSDPLGKTILFRKRPLTVIGVMKKD 

orf 134a . pep ENAFGNS DVLMLWS PYTTVMHQI TGE SHTNS I TVKIKDNANTQVAEKGLT DLLKARHGTE 
I I t I t I I I f I I ! 1 I I 1 1 I I t 1 I 1 1 I ! I f I 1 I I I I I I I I I I 1 I 1 1 1 I I I | I I | | | | | f 1 1 1 
orf 134-1 ENAFGNSDVLMLWS PYTTVMHQI TGE SHTNS ITVKIKDNANTQVAEKGLTDLLKARHGTE 
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orf 134a - pep DFFMIWSDSIRQIVESTTGTMKLLISSIALISLVVGGIGVMNIMLVSVTERTKEIGIRMA 
IMIIIIIIIIIIMlll tl I IMMIIIIIMIIMIIII III III Ml INI j| INI 
orf 134-1 DFEWNNSDSIRQIVESTTGTMKLLISSIALISLVVGGIGVMNIMLVSVTERTKEIGIRMA 

orf 134a .pep IGARRGNILQQFLIEAVLICVIGGLVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVAC 

1 1 1 1 1 1 1 r 1 1 1 1 1 1 1 1 1 1 1 1 i e 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

orf 134-1 IGARRGNILQQFLIEAVLICVIGGLVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVAC 

orf 134a. pep STGIGIAFGFMPANKAAKLNPIDALAQDX 
I I I I I I I I I I I I I I I I I I I I I I II I I I I I 
orf 134-1 STGIGIAFGFMPANKAAKLNPIDALAQDX 

Homology with a predicted ORF from N. gonorrhoeae 

ORF134 shows 96.8% identity over a 154aa overlap with a predicted ORF (ORF134.ng) from N. 
gonorrhoeae: 

orf 134 . pep ARHGTEDFFMNNSDXIRQIVESTTGTMKLL 30 

Mill MM Ml II I II: I II MM III I 
orfl34ng GESHTNSITVKIKDNANTRVAEKGLAELIjKARHGTEDFFMNNSDSIRQMVESTTGTMKLL 264 

orf 134 . pep ISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMAIGARRGNIXQQFLIEAVLICVIGG 90 

I I I I I M I II II M I M I M I II II M II I It I I I I I I I || | | || || || : | | | 

orfl34ng ISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMAIGARRGNILQQFLIEAVLICIIGG 324 

orf 134 . pep LVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVACSTGIGIAFGFMPANKAAKLNPIDA 150 

I I I E I I I I I I I I I 1 1 i I 1 1 t I I I 1 1 I I I I i I M II I M I 1 1 M II II I 1 I I II I M I M 
orfl34ng LVGVGLSAAVSLVFNHFVTDFPMDISAASVIGAVACSTGIGIAFGFMPANKAAKLNPIDA 384 j 

orf 134. pep LAQD 154 

Mil 1 
orfl34ng LAQD 388 

The complete length ORF134ng nucleotide sequence <SEQ ID 537> is: 

1 ATGTCGGTGC AAGCAGTATT GGCGCACAAA ATGCGTTCGC TTCTGACCAT 

51 GCTCGGCATC ATCATCGGTA TCGCTTCGGT TGTCTCCGTC GTCGCGCTGG 

101 GCAACGGTTC GCAGAAAAAA ATCCTCGAAG ACATCAGTTC GATGGGGACG 

151 AACACCATCA GCATCTTCCC CGGGCGCGGC TTCGGCGACA GGCGCAGCGG 

201 CAAAATCAAA ACCCTGACCA TAGACGACGC AAAAATCATC GCCAAACAAA 

251 GCTACGTTGC CTCCGCCACG CCCATGACTT CGAGCGGCGG CACGCTGACC 

301 TACCGCAATA CCGACCTGAC CGCTTCTTTG TACGGTGTGG GCGAACAATA 

351 TTTCGACGTG CGCGGGCTGA AGCTGGAAAC GGGGCGGCTG TTTGATGAGA 

401 ACGATGTGAA AGAAGACGCG CAAGTCGTCG TCATCGACCA AAATGTCAAA 

451 GACAAACTCT TTGCGGACTC GGATCCGTTG GGTAAAACCA TTTTGTTCAG 

501 GAAACGCCCC TTGACCGTCA TCGGCGTGAT GAAAAAAGAC GAAAACGCTT 

551 TCGGCAATTC CGACGTGCTG ATGCTTTGGT CGCCCTATAC GACGGTGATG 

601 CACCAAATCA CAGGCGAGAG CCACACCAAC TCCATCACCG TCAAAATCAA 

651 AGACAATGCC AATACCCGGG TTGCCGAAAA AGGGCTGGCC GAGCTGCTCA 

701 AAGCACGGCA CGGCACGGAA GACTTCTTTA TGAACAACAG CGACAGCATC 

751 AGGCAGATGG TCGAAAGCAC CACCGGTACG ATGAAGCTGC TGATTTCCTC 

801 CATCGCCCTG ATTTCATTGG TAGTCGGCGG CATCGGTGTG ATGAACATTA 

851 TGCTGGTGTC CGTTACCGAG CGCACCAAAG AAATCGGCAT ACGGATGGCA 

901 ATCGGCGCGC GGCGCGGCAA TATTTTGCAG CAGTTTTTGA TTGAGGCGGT 

951 GTTAATCTGC ATCATCGGAG GCTTGGTCGG CGTAGGTTTG TCCGCCGCCG 

1001 TCAGCCTCGT GTTCAATCAT TTTGTAACCG ATTTCCCGAT GGACATTTCG 

1051 GCGGCATCCG TTATCGGGGC GGTCGCCTGT TCGACCGGAA TCGGCATCGC 

1101 GTTCGGCTTT ATGCCTGCCA ATAAGGCAGC CAAACTCAAT CCGATAGATG 

1151 CATTGGCGCA GGATTGA 

This encodes a protein having amino acid sequence <SEQ ID 538>: 

1 MSVQAVLAHK MRSLLTMLGI IIGIASWSV VALGN GSQKK ILEDISSMGT 

51 NTISIFPGRG FGDRRSGKIK TLTIDDAKII AKQSYVASAT PMTSSGGTLT 

101 YRNTDLTASL YGVGEQYFDV RGLKLETGRL FDENDVKEDA QVWIDQNVK 

151 DKLFADSDPL GKTILFRKRP LTVIGVMKKD ENAFGNSDVL MLWSPYTTVM 

201 HQITGESHTN SITVKIKDNA NTRVAEKGLA ELLKARHGTE DFFMNNSDSI 

251 RQMVESTTGT MK LLISSIAL ISLWGGIGV MNIMLVSVTE RTKEIGIRMA 

301 IGARRGNILQ QFLIEAVLIC IIGGLVGV GL SAAVSLVFNH FVTDFPMDIS 
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351 AAS VIGAVAC STGIGIAFGF MPANKAAKLN PIDALAQD* 

ORF134ng and ORF134-1 show 97.9% identity in 388 aa overlap: 

orfl34ng MSVQAVLAHKMRSLLTMLGIIIGIASWSWALGNGSQBCKILEDISSMGTNTISIFPGRG 
lillllll II II III I II t III I I ttlllltll I I Mil INI III IIMIIII It 
orf 134-1 MSVQAVLAHKMRSLLTMLGIIIGIASWSVVALGNGSQKKILEDISSIGTNTISIFPGRG 

orfl34ng FGDRRSGKIKTLTIDDAKIIAKQSYVASATPMTSSGGTLTYRNTDLTASLYGVGEQYFDV 
III II II: II II II IMM III lllllllllll II MM MM III I II I IIMIIII II 
orf 134-1 FGDRRSGRIKTLTIDDAKIIAKQSYVASATPMTSSGGTLTYRNTDLTASLYGVGEQYFDV 

orfl34ng RGLKLETGRL FDENDVKEDAQWV I DQNVKDKLFADSDPLGKT ILFRKRPLTVIGVMKKD 

III II II II I Mil I I II ! M 1 1 II I III I Mill I Mllllllll I II I III Mill II 
orf 134-1 RGLKLETGRLFDENDVKEDAQVWIDQNVKDKLFADSDPLGECTILFRKRPLTVIGVMKKD 

orfl34ng ENAFGNSDVLMLWSPYTTVMHQITGESHTNSITVKIKDNANTRVAEKGLAELLKARHGTE 
1 1 M I I M I M 1 1 1 M I I M i M I M 1 1 II M M M I M M I: M M M :: 1 1 1 1 1 II M 
orf 134-1 ENAFGNSDVLMLWSPYTTVMHQITGESHTNSITVKIKDNANTQVAEKGLTDLLKARHGTE 

orfl34ng DFFMNNSDSIRQMVESTTGTMKLLISSIALISLVVGGIGVMNIMLVSVTERTKEIGIRMA 
Ml Mill t 111:11 I Ml MM I Mill IMM Ml IIMM MM MM MM ill II 
orf 134-1 DFFMNNSDSIRQIVESTTGTMKLLISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMA 

or f 13 4ng IGARRGNILQQFLIEAVLICI IGGLVGVGLSAAVS LVFNHFVTDFPMDI SAAS VTGAVAC 

I M M II II I M I II I M II : I M M M I II I II 11 I II I M I I M I 1 11 1 Mllllll 
orf 134-1 IGARRGNILQQFLIEAVLICVIGGLVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVAC 

orf!34ng STGIGI AFGFMPANKAAKLNPI DALAQDX 

Mill M I Ml Mill I M I Ml I II II I 
orf 134-1 STGIG IAFGFMPANKAAKLNP I DALAQDX 

ORF134ng also shows homology to an Exoli ABC transporter: 

sp|P75831|YBJZ_ECOLI HYPOTHETICAL ABC TRANSPORTER ATP-BINDING PROTEIN YBJZ >gi5 
(AE000189) o648; similar to YBBA_HAEIN SW: P45247 [Escherichia coli] Length = 
648 

Score - 297 bits (753) , Expect - 6e-80 

Identities = 162/389 (41%), Positives = 230/389 (58%), Gaps = 1/389 (0%) 

Query: 1 MSVQAVLAHKMRSLLTMLXXXXXXXXXXXXXXLGNGSQKKILEDISSMGTNTI S I FPGRG 60 

M+ +A+ A+KMR+LLTML +G+ +++ +L DI S+GTNTI ++PG+ 

Sbjct: 260 MAWRALAANKMRTLLTMLGI I IGIASWS IVWGDAAKQMVLADIRSIGTNTI DVYPGKD 319 

Query: 61 FGDRRSGKIKTLTIDDAKIIAKQSYVASATPMTSSGGTLTYRNTDLTASLYGVGEQYFDV 120 

FGD + L DD I KQ +VASATP S L Y N D+ AS GV YF+V 

Sbjct: 320 FGDDDPQYQQALKYDDLIAIQKQPWVASATPAVSQNLRLRYNNVDVAASANGVSGDYFNV 379 

Query: 121 RGLKLETGRLFDENDVKEDAQVW I DQNVKDKLFAD- SDPLGKT ILFRKRPLTVI GVMKK 179 

G+ G F++ + AQVW+D N + +LF +D +G+ IL P VIGV ++ 
Sbjct: 380 YOdTFSEGNTFNQEQLNGRAQWVLDSNTRRQLFPHKADWGEVILVGNMPARVIGVAEE 439 

Query: 180 DENAFGNS DVLMLWS PYTTVMHQITGESHTNS ITVKI KDNANTRVAEKGLAELLKARHGT 239 

++ FG+S VL +W PY+T+ ++ G+S NSITV++K+ ++ AE+ L LL RHG 
Sbjct: 440 KQSMFGSSKVLRVWLPYSTMSGRVMGQSWLNSITVRVKEGFDSAEAEQQLTRLLSLRHGK 499 

Query: 240 EDFFMNNSDSIRQMVXSTTGTMKXXXXXXXXXXXWGGIGVMlIMLVSvTERTKEIGIRM 299 

+DFF N D + + VE TT T++ WGG I GVMN IMLVS VTERT+E IG IRM 

Sbjct: 500 KDFFTWNMDGVIJCrVFlKTTRTLQLFLTLVAVISLWGGIGV^IMLVSvTERTREIGI^ 559 

Query: 300 AIGARRGNILQQFLIEXXXXXXXXXXXXXXXXXXXXXXFNHFVTDFPMDISAASVIGAVA 359 

A+GAR ++LQQFLIE F+ + + S +++ A 

Sbjct: 560 AVGARAS DVLQQFLIEAVLVCLVGGALGITLSLLI AFTLQLFLPGWE IGFS PLALLLAFL 619 

Query: 360 CSTGIGI AFGFMPANKAAKLNPI DALAQD 388 

CST GI FG++PA AA+L+ P+ DALA++ 
Sbjct: 620 CSTVTGI LFGWL PARNAARLDPVDALARE 648 
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Based on this analysis, including the presence of the leader peptide and transmembrane regions in 
the gonococcal protein, it is prediceted that these proteins from N.meningitidis and JV. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 65 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 539>: 

CGATGCTGCT GCTGTTTTAC GCGGTAACGA T . CTGCCTTT 
GTTACCCTGA GTTACACCTC GTCGATTTTT TTGGCGGTAT 
GATTTTGAAA GAACGGATTT CCGTTTACAC GCAGGCGGTG 
GTTTTGCCGG CGTGGTATTG CTGCTTAATC CCTCGTTCCG 
GAAACGGCGG CACTCGCCGG GCTGGCGGGC GGCGCGATGT 
GTATTTGAAA GTGCGCGAAC TGTCTTTGGC GGGCGAACCC 
TCGTGTTTTA CCTTTCCGTG ACAGGTGTGG CGATGTCGTC 
ACGCTGACCG GCTGGCACAC CCTGTCCTTT CCATCGGCAG 
GTGCATCGGC GTGTCCGCGC TGATTGCCCA ACTGTCGATG 
ACAAAGTCGG CGACAAATTC ACGGTTGCCT CGCTTTCCTA 
GTTTTTTCCG CTCTGTCTGC CGCATTTTTT CTGGGCGAAG 
GCAGGAAATA CTCGGTATGT GCATCATCAT CCTCAGCGGT 

I 

This corresponds to the amino acid sequence <SEQ ID 540; ORF135>: 



1 


. . GGGACGGGAG 


51 


GGCCACTGGC 


101 


TTTCCTTCCT 


151 


CTGCTCCTTG 


201 


CAGCGGTCAG 


251 


CCGGCTGGGC 


301 


GGCTGGCGCG 


351 


GGTTTGGGCG 


401 


TTTATCTGTC 


451 


ACGCGCGCCT 


501 


TATGACCGTC 


551 


AGCTTTTCTG 


601 


ATTTTGA 



1 . . GTGAMLLLFY AVTILPLATG VTLSYTSSIF LAVFSFLILK ERISVYTQAV 
51 LLLGFAGWL LLNPSFRSGQ ETAALAGLAG GAMSGWAYLK VRELSLAGEP 
101 GWRWFYLSV TGVAMSSVWA TLTGWHTLSF PSAVYLSCIG VSALIAQLSM 
151 TRAYKVGDKF TVASLSYMTV VFSALSAAFF LGEELFWQEI LGMCIIISAV 
201 F* 

Further work revealed the complete nucleotide sequence <SEQ ID 54 1>: 



1 ATGGATACCG CAAAAAAAGA CATTTTAGGA TCGGGCTGGA TGCTGGTGGC 

51 GGCGGCCTGC TTTACCATTA TGAACGTATT GATTAAAGAG GCATCGGCAA 

101 AATTTGCCCT CGGCAGCGGC GAATTGGTCT TTTGGCGCAT GCTGTTTTCA 

151 ACCGTTGCGC TCGGGGCTGC CGCCGTATTG CGTCGGGACA mCTTCCGCAC 

201 GCCCCATTGG AAAAACCACT TAAACCGCAG TATGGTCGGG ACGGGGGCGA 

251 TGCTGCTGCT GTTTTACGCG GTAACGCATC TGCCTTTGGC CACTGGCGTT 

301 ACCCTGAGTT ACACCTCGTC GATTTTTTTG GCGGTATTTT CCTTCCTGAT 

351 TTTGAAAGAA CGGATTTCCG TTTACACGCA GGCGGTGCTG CTCCTTGGTT 

401 TTGCCGGCGT GGTATTGCTG CTTAATCCCT CGTTCCGCAG CGGTCAGGAA 

451 ACGGCGGCAC TCGCCGGGCT GGCGGGCGGC GCGATGTCCG GCTGGGCGTA 

501 TTTGAAAGTG CGCGAACTGT CTTTGGCGGG CGAACCCGGC TGGCGCGTCG 

551 TGTTTTACCT TTCCGTGACA GGTGTGGCGA TGTCGTCGGT TTGGGCGACG 

601 CTGACCGGCT GGCACACCCT GTCCTTTCCA TCGGCAGTTT ATCTGTCGTG 

651 CATCGGCGTG TCCGCGCTGA TTGCCCAACT GTCGATGACG CGCGCCTACA 

701 AAGTCGGCGA CAAATTCACG GTTGCCTCGC TTTCCTATAT GACCGTCGTT 

751 TTTTCCGCTC TGTCTGCCGC ATTTTTTCTG GGCGAAGAGC TTTTCTGGCA 

801 GGAAATACTC GGTATGTGCA TCATCATCCT CAGCGGTATT TTGAGCAGCA 

851 TCCGCCCCAC TGCCTTCAAA CAGCGGCTGC AATCCCTGTT CCGCCAAAGA 

901 TAA 

This corresponds to the amino acid sequence <SEQ ID 542; ORF135-l>: 



1 MDTAKKDILG SGWMLVAAA C FTIMNVLIBCE ASAKFALGSG ELVFWRMLFS 

51 TVALGAAAVL RRDXFRTPHW KNHLNRS MVG fGAMLLLFYA VTHL PLATGV 

101 T LSYTSSIFL AVFSFLIL KE RISVYTQ AVL LLGFAGWLL LNPSF RSGQE 

151 TAALAGLAGG AMSGWAYLKV RELSLAGEPG WRWFYLSVT GVAMSSVWAT 

201 LTGWHTLS FP SAVYLSCIGV SALIA QLSMT RAYKVGDKFT VAS LSYMTW 

251 FSALSAAFFL GEELFWQ EIL GMCIIILSGI LSSI RPTAFK QRLQSLFRQR 

301 * 

Computer analysis of this amino acid sequence gave the following results: 
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Homologv with a predicted ORF from N. meningitidis (strain A) 

ORF135 shows 99.0% identity over a 197aa overlap with an ORF (ORF135a) from strain A of K 
meningitidis: 

10 20 30 

GTGAMLLLFYAVTILPIATGVTLSYTSSIF 
I I I I I I I I I I I I I IIIIIIIIIMliill 
STVALGAAAVLRRDTFRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLATGVTLS YTSS I F 
50 60 70 80 90 100 

40 50 60 70 80 90 

LAVFS FL I LKER I SVYTQAVLLLGFAGWLLLN PS FRSGQETAALAGLAGGAMSGWAYLK 

1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 i 1 1 1 1 1 1 J 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 f 1 1 1 1 1 1 

LAVFS FLI LKER I SVYTQAVLLLGFAGWLLLNPS FRSGQETAALAGLAGGAMS GWAYLK 
110 120 130 140 150 160 



orfl35.pep 
orfl35a 

orf!35.pep 
orfl35a 



100 110 120 130 140 150 

orf 135 . pep VRELSLAGEPGWRWFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSM 
MIlMlilllllltlllllltMllllllllllilllllliMIIIIIMIIlltllll 
orf 135a VRELSLAGEPGWRWFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSM 
170 180 190 200 210 220 



160 170 180 190 200 

orf 135 . pep TRAYKVGDKFTVASLSYMTWFSALSAAFFLGEELFWQEILGMCIIISAVFX 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I 
orf 135a TRAYKVGDKFTVASLSYMTWFSALSAAFFLAEELFWQEILGMCIIILSGILSSIRPTAF 
230 240 250 260 270 280 



orfl35a KQRLQS LFRQRX 

290 300 

The complete length ORF135a nucleotide sequence <SEQ ID 543> is: 

1 ATGGATACCG CAAAAAAAGA CATTTTAGGA TCGGGCTGGA TGCTGGTGGC 

51 GGCGGCCTGC TTTACCATTA TGAACGTATT GATTAAAGAG GCATCGGCAA 

101 AATTTGCCCT CGGCAGCGGC GAATTGGTCT TTTGGCGCAT GCTGTTTTCA 

151 ACCGTTGCGC TCGGGGCTGC CGCCGTATTG CGTCGGGACA CCTTCCGCAC 

201 GCCCCATTGG AAAAACCACT TAAACCGCAG TATGGTCGGG ACGGGGGCGA 

251 TGCTGCTGCT GTTTTACGCG GTAACGCATC TGCCTTTGGC CACCGGCGTT 

301 ACCCTGAGTT ACACCTCGTC GATTTTTTTG GCGGTATTTT CCTTCCTGAT 

351 TTTGAAAGAA CGGATTTCCG TTTACACGCA GGCGGTGCTG CTCCTTGGTT 

401 TTGCCGGCGT GGTATTGCTG CTTAATCCCT CGTTCCGCAG CGGTCAGGAA 

451 ACGGCGGCAC TCGCCGGGCT GGCGGGCGGC GCGATGTCCG GCTGGGCGTA 

501 TTTGAAAGTG CGCGAACTGT CTTTGGCGGG CGAACCCGGC TGGCGCGTCG 

551 TGTTTTACCT TTCCGTGACA GGTGTGGCGA TGTCATCGGT TTGGGCGACG 

601 CTGACCGGCT GGCACACCCT GTCCTTTCCA TCGGCAGTTT ATCTGTCGTG 

651 CATCGGCGTG TCCGCGCTGA TTGCCCAACT GTCGATGACG CGCGCCTACA 

701 AAGTCGGCGA CAAATTCACG GTTGCCTCGC TTTCCTATAT GACCGTCGTT 

751 TTTTCCGCTC TGTCTGCCGC ATTTTTTCTG GCCGAAGAGC TTTTCTGGCA 

801 GGAAATACTC GGTATGTGCA TCATCATCCT CAGCGGTATT TTGAGCAGCA 

851 TCCGCCCCAC TGCCTTCAAA CAGCGGCTGC AATCCCTGTT CCGCCAAAGA 

901 TAA 

This encodes a protein having amino acid sequence <SEQ ID 544>: 



1 MDTAKKDILG SGWMLVAAA C FTIMNVLIKE ASAKFALGSG ELVFWRMLFS 

51 TVALGAAAVL RRDTFRTPHW KNHLNRS MVG TGAMLLLFYA VTHL PLATGV 

101 T LSYTSSIFL AVFSFLIL KE RISVYTQ AVL LLGFAGWLL LNPSF RSGQE 

151 TAALAGLAGG AMSGWAYLKV RELSLAGEPG WRWFYLSVT GVAMSSVWAT 

201 LTGWHTLS FP SAVYLSCIGV SALIA QLSMT RAYKVGDKFT VAS LSYMTW 

251 FSALSAAFFL AEELFW QEIL GMCIIILSGI LSSI RPTAFK QRLQSLFRQR 

301 * 

ORF135a and ORF135-1 show 99.3% identity in 300 aa overlap: 



orf 135a .pep MDTAKKDILGSGWMLVAAACFTIMNVLIKEASAKFALGSGELVFWRMLFSTVALGAAAVL 

1 1 1 1 1 1 1 1 f 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 

orf 135-1 MDTAKKDILGSGWMLVAAAC FT IMNVLI KEASAKFALG SGELVFWRMLFSTVALGAAAVL 
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orf 135a. pep 
orfl35-l 
orf 135a. pep 
orfl35-l 
orf 135a. pep 
orfl35-l 
orf 135a. pep 
orfl35-l 



RRDTET^TPHWKNHLNRSMVGTGAMLLLFYAVTHLPIATGVTLSYTSSIFLAVFSFLILKE 
I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I M I I I I I I I I I I I I I I I I I I 
RRDXraTPHWKNHLNRSMVGTGAMLLLrc^ 

RI S VYTQAVLLLG FAGWLLLN PS FRSGQETAALAGLAGGAMSGWAYLKVRELSLAGE PG 
I I I I I I I I I I I I II II I I I I I I I I I I I I I II I I I II I I I I I I I I I I I I I I I I I I I I I I I I 
RISVYTQAVLLLGFAGWLLLNPSFRSGQETAALAGLAGGAMSGWAYLKVRELSLAGEPG 

WRWFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSMTRAYKVGDKFT 
I I M I I I I I I I! I I I I I ! I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I 
WRWFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSMTRAYKVGDKFT 

VASLSYMT WFS ALSAAFFLAEELFWQEILGMCI 1 1 LSGILS S IRPTAFKQRLQSLFRQR 
I II II III I II I I II I I I M:i Ml I II I II I Ml I II II I I II I I I II I I I Ml !! I II 
VASLSYMTWFSALSAAFFLGEELFWQEILGMCIIILSGILSSIRP^AFKQRLQSLFRQR 



Homology with a predicted ORF from N. gonorrhoeae 

ORF135 shows 97% identity over a 201 aa overlap with a predicted ORF (ORF135ng) from 
N. gonorrhoeae: 

orf 135 . pep GTGAMLLL FYAVTXLPLATGVT LS YT S S I F 30 

III MUM I III I I I : I I I II Ml I I I I 
o r f 1 3 5ng STVT LGAAAVLRRDT FRT PHWKNHLNRSMVGTGAMLLL FYAVTHLPLTTGVTLS YTSS I F 335 

orf 135 . pep LAVFS FLILKERI SVYTQAVLLLGFAGWLLLNPS FRSGQETAALAGLAGGAMSGWAYLK 90 

II II M M I II I 1 II I II I M I I II II I M II II II I I II I MIIMIMM II 

orfl35ng LAVFS FL I LKERI S VYTQAVLLLG FAGWLLLNPS FRSGQE PAALAGLAGGAMSGWAYLK 395j 

orf 135 . pep VRELSLAGEPGWRWFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSM 15o| 

I I I 1 I J I ! I I I I i I I I I I t r I I I I 1 I I I I 1 I 1 I i t 1 I I 1 t I I I I I ! I I MIIMIMM 1 
orfl35ng VRELSLAGEPGWRWFYLSATGVAMSSVWATLTGWHTLSFPSAVYLSGIGVSALIAQLSM 455 

orf 135 . pep TRAYKVGDKFTVASLS YMTWFSALSAAFFLGEELFWQE I LGMC III SAVF 201 

II I I I I II II I I I I I I I I I II I I I I I II II I II I I I I M II I II I I I I I : I 
orfl35ng TRAYKVGDKFTVASLSYMTWFSALSAAFFLGEELFWQEILGMCIIISAAF 506 

An ORF135ng nucleotide sequence <SEQ ID 545> was predicted to encode a protein having amino 
acid sequence <SEQ ID 546>: 



1 MPSEKAFRRH 

51 ILDIQLGLFR 

101 NLGHFTDTHL 

151 FRQCGHINRL 

201 QKQAKTHSTS 

251 NVLI KEASAK 

301 NRSMVGTGAM 



LRTASFQGLH 
IDFAALAVYR 
IAQARRFIAD 
APGKDCRNGK 
LAARFTIRPS 
FALGSGELVF 
LLLFYAVTHL 



351 YTQAVLLLGF AGWLLLNPS 



401 LAGEPGWRW 
451 AQLSMTRAYK 
501 IISAAF* 



FYLSATGVAM 
VGDKFTVASL 



LHHFHQKVGK 
RTQVDFIHTV 
FGNIRPMRRG 
RDKVFFHTRH 
LSQRPFMDTA 
WRMLFSTVTL 
PLTTGVT LSY 
FRSGQE PAAL 
SSVWATLTGW 
SYMTWFSAL 



CGIIGFGIHI FPTLLPA AQG 
IDGIASDQAF SEWQILRRL 
EAKTFCRCFR FDGIDGIHGD 
YNQVCLEKTN CSARKIKFRH 
KKDILGS GWM LVAAACFTVM 
GAAAVLRRDT FRTPHWKNHL 
TSSIFLAVFS FLILKERI SV 



AGLAGGAMSG WAYLKVRELS 
HTLS FPSAVY LSGIGVSALI 
SAAFFL GEE L FWQEILGMCI 



Further work revealed the following gonococcal sequence <SEQ ID 547>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 



ATGGATACCG 
GGCGGCCTGC 
AATTTGCCCT 
ACCGTTACGC 
GCCCCATTGG 
TGCTGCTGCT 
ACCCTGAGTT 
TTTGAAAGAA 
TTGCCGGCGT 
CCGGCGGCAC 
TTTGAAAGTG 
TGTTTTACCT 
Ctgaccggct 



CAAAAAAAGA 
TTCACCGTTA 
CGGCAGCGGC 
TCGGTGCTGC 
AAAAACCACT 
GTTTTACGCG 
ACACCTCGTC 
CGGATTTCCG 
GGTATTGCTG 
TCGCCGGGCT 
CGCGAACTGT 
TTCCGCAACC 
ggCACAcccT 



CATTTTAGGA 
TGAACGTATT 
GAATTGGTCT 
CGCCGTATTG 
TAAACCGCAG 
GTAACGCATC 
GATTTTTttg 
TTTACACGCA 
CTTAATCCCT 
GGCGGGCGGC 
CTTTGGCGGG 
GGCGTGGCGA 
GTCCTTTcca 



TCGGGCTGGA 
GATTAAAGAG 
TTTGGCGCAT 
CGGCGCGACA 
TATGGTCGGG 
TGCCTTTGAC 
GCGGTATTTT 
GGCGGTGCTG 
CGTTCCGCAG 
GCGATGTCCG 
CGAACCCGGC 
TGTCGTCggt 
tcggcagttt. 



TGCTGGTGGC 
GCATCGGCAA 
GCTGTTTTCA 
CCTTCCGCAC 
ACGGGGGCGA 
AACCGGCGTT 
CCTTCCTGAT 
CTCCTTGGTT 
CGGTCAGGAA 
GCTGGGCGTA 
TGGCGCGTCG 
ttgggcgacg 
ATCtgtCGGG 
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651 CATCGGCGTG tccgcgCtgA TTGCCCAaCT GtcgatgAcg cGCGcctaca 

701 aaGTCGGCGA CAAATTCACG GTTGCCTCGC tttcctaTAt gaccgtcGTC 

751 TTTTCCGCCC TGTCTGCCGC ATTTTTTCTg ggcgaagagc tttTCtggCA 

801 GGAAATACTC GGTATGTGCA TCATTAtccT CAGCGGCATT TTGAGCAGCA 

851 TCCGCCCCAT TGCCTTCAAA CAGCGGCTGC AAGCCCTCTT CCGCCAAAGA 

901 TAA 

This corresponds to the amino acid sequence <SEQ ED 548; ORF135ng-l>: 

1 MDTAKKD ILG SGWMLVAAA C FTVMNVLIKE ASAKFALGSG ELVFWRMLFS 

51 TVTLGAAAVL RRDTFRTPHW KNHLNRS MVG TGAMLLLFYA VTHL PLTTGV 

101 TLSYTSSIFL AVFSFLIL KE RISVYTQAVL LLGFAGWLL LNPSFRSGQE 

151 PAALAGLAGG AMSGWAYLKV RELSLAGEPG WRWFYLSAT GVAMSSVWAT 

201 LTGWHTLS Fg SAVYLSGIGV SALIA QLSMT RAYKVGDKFT VASLSYMTW 

251 FSALSAAFFL" GEELFWQEIL GMCIIILSGI LSSI RPIAFK QRLQALFRQR 

301 * 

ORF135ng-l and ORF135-1 show 97.0% identity in 300 aa overlap: 

nrf 135na-l pep MDTAKKDILGSGWMLVAAACFTVMNVLIKEASAKFALGSGELVFWRMLFSTVTLGAAAVL 
orf!35ng l.pep ( | |(|| ^ ,,,, ,,,,,,,,,,,, , ,, , , , , , , , , , , 

orf 135-1 MDTAKKD I LG SGWMLVAAAC FT IMNVLIKEASAKFALGSGELVFWRMLFSTVALGAAAVL 

orf 135na-l . pep RRDTFRTPHWECNHLNRSMVGTGAMLLLFYAVTHLPLTTGVTLSYTSSIFLAVFSFLILKE 
orf 135-1 RRDXFRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLATGVTLSYTSSIFLAVFSFLILKE 

orfl35na-l.pep RI S VYTQAVLLLG FAGWLLLNPS FRSGQE PAALAGLAGGAMSGW AYLKVRELS kAGE PG 
orfl35ng l.pep , | M 1 1 I 1 I I M M I I I I 1 I I N 

orf 135-1 RISVYTQAVLLLGFAGWLLLNPSETISGQETAALAGLAGGAMSGWAYLKVRELSIA 

orf 135na-l .pep WRWFYLSATGVAMSSVWATLTGWHTLSFPSAVYLSGIGVSAL^ 

° tfl 9 P P | , 1 | i | | | : | | | | | | | | | | | I | I I 1 1 I I II I I I I I I I I I I I I I I I H 1 I I I I I i M I I 1 

orf 135-1 WRVVFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSMTRAYKVGDKFT 

orf 135ng-l . pep VASLSYMTWFSALSAAFFLGEELFWQEILGMCIIILSGILSSIRPIAFKQ^QALreQR 
orf 135ng .pep , , ,,,,,,,, | , | | | | 1 | i | | | | | I I I 1 I II : I I I I I 

orf 135-1 VASLSYMTVVFSALSAAFFLGEELFWQEILGMCIIILSGILSSIRPTAFKQRLQSLFRQR 

Based on this analysis, including the presence of several putative transmembrane domains in the 
gonococcal protein, it is predicted that the proteins from ^meningitidis and N.gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 66 

The following DNA sequence was identified in N. meningitidis <SEQ ID 549>: 

1 1 ATGAAGCGGC GTATAGCCGT CTTCGTCCTG TTCCCGCAGA TAATCCGAGT 

51 TTTGGGACAA CTGTTGCCGA AAATCGTCAA TACAGTTCCG GCACATCGGA 

101 TGCTCTTCCA GATTTTCGGG ATGTTCTTTT TCTTCATACA CCAGCAATAT 

151 CTGCCCGGGA TCGCCGAAAT CGATTCCCCA TGCGGCATCG TGTTCGGTGC 

201 GCTCCTCTTC CGTCATCTGC CCGCGCATTG CCTGTATGGT AAAGCCGCCG 

AC 251 TAGGGGATGC CgTTGCACAC GAACATCCAG TCGCTGATGT CGTCAACCGG 

H 301 AACGCAAACG cTTTCGCCTT GTTCGACATT GGTCAGTTCG CCsGGTTCAT 

351 TGTTCAGCAC ACCGTAAATA TAAAGACCGT CAAAATAAAT ATCGTCGATC 

401 CACATATGTT CGCAAATTTC GCCGTCTTCG CCGTCTTGGA AAAAAGGGAC 

451 TTTGACCATG GCAAAATCCA AGGCGGAAAT AATGCGGCGG CGTTCCCAAA 

4:a sol AAAGcTCGCG CCAAAAATAT TTGAATGTTT TACGGGCGCG TTCGTCGGCA 

J 551 CGGTTTACCG GTTCGTCTGC CTGTTCTACA TAATAAATGA CGGAATCGCC 

601 CATCAT&TCT GCTCCTCAAC GTGTACGGTA TCTGTTTGCA CCTTACTGCG 

651 GCTTTCTgcC kTCGGCATCC GATTCGGATT TGAAAAGTTC xraarwyATTCG 

701 GAATAG 

55 This corresponds to the amino acid sequence <SEQ ID 550; ORF136>: 

1 MKRRIAVFVL FPQIIRVLGQ LLPKIVNTVP AHRMLFQIFG MFFFFIHQQY 
51 LPGIAEIDSP CGIVFGALLF RHLPAHCLYG KAAVGDAVAH EHPVADWNR 
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101 NANAFALFDI GQFAXFIVQH TVNIKTVKIN IVDPHMFANF AVFAVLEKRD 
151 FDHGKIQGGN NAAAFPKKLA PKIFECFTGA FVGTVYRFVC LFYIINDGIA 
201 HHSAPQRVRY LFAPYCGFLP SASDSDLKSS XXSE* 

Further work revealed the complete nucleotide sequence <SEQ ID 55 1>: 

1 ATGATGAAGC GGCGTATAGC CGTCTTCGTC CTGTTCCCGC AGATAATCCG 

51 AGTTTTGGGA CAACTGTTGC CGAAAATCGT CAATACAGTT CCGGCACATC 

101 GGATGCTCTT CCAGATTTTC GGGATGTTCT TTTTCTTCAT ACACCAGCAA 

151 TATCTGCCCG GGATCGCCGA AATCGATTCC CCATGCGGCA TCGTGTTCGG 

201 TGCGCTCCTC TTCCGTCATC TGCCCGCGCA TTGCCTGTAT GGTAAAGCCG 

251 CCGTAGGGGA TGCCGTTGCA CACGAACATC CAGTCGCTGA TGTCGTCAAC 

301 CGGAACGCAA ACGCTTTCGC CTTGTTCGAC ATTGGTCAGT TCGCCGGGTT 

351 CATTGTTCAG CACACCGTAA ATATAAAGAC CGTCAAAATA AATATCGTCG 

401 ATCCACATAT GTTCGCAAAT TTCGCCGTCT TCGCCGTCTT GGAAAAAAGG 

451 GACTTTGACC ATGGCAAAAT CCAAGGCGGA AATAATGCGG CGGCGTTCCC 

501 AAAAAAGCTC GCGCCAAAAA TATTTGAATG TTTTACGGGC GCGTTCGTCG 

551 GCACGGTTTA CCGGTTCGTC TGCCTGTTCT ACATAATAAA TGACGGAATC 

601 GCCCATCATT CTGCTCCTCA ACGTGTACGG TATCTGTTTG CACCTTACTG 

651 CGGCTTTCTG CCTTCGGCAT CCGATTCGGA TTTGAAAAGT TCCAAATATT 

701 CGGAATAG 

This corresponds to the amino acid sequence <SEQ ID 552; ORF136-l>: 

1 MMKR RIAVFV LFPQI IRVLG QL LPKIVNTV PAHRMLFQIF GMFFFFIHQQ . 

51 YLPGIAEIDS PCGIVFGALL FRHLPAHCLY GKAAVGDAVA HEHPVADWN ! 

101 RNANAFALFD IGQFAGFIVQ HTVNIKTVKI NIVDPHMFAN FAVFAVLEKR 

151 DFDHGKIQGG NNAAAFPKKL APKIFECFT G AFVGTVYRFV CLFYIIN DGI 

201 AHHSAPQRVR YLFAPYCGFL PSASDSDLKS SKYSE* 

Computer analysis of this amino acid sequence gave the following results: j 

Homology with a predicted ORF from N.meninzitidis (strain A) ' 
ORF136 shows 71 .7% identity over a 237aa overlap with an ORF (ORF136a) from strain A ofN. 
meningitidis: 



10 20 30 40 50 59 

orf 136 . pep MKRRIAVFVLFPQIIRVLGQLLPKIVNTVPAHRMLFQIFGMFFFFIHQQYLPGIAEIDS 
IlillMIII: I IMMIMMMMMMMIM I I I I I I I I I I I I I I I I I I I I I 
orf 136a MMKRRIAVFVLLMQKIRILGQLLPKIVNTVPAHRMLFQXFGMFFFFIHQQ YLPGIAEIDS 

10 20 30 40 50 60 

60 70 80 90 100 110 119 

PCGIVFGALLFRHLPAHCLYGKAAVGDAVAHEHPVADWNRNANAFALFDIGQFAXFIVQ 
II MM hill II M I M I M II I : II 11 II I I M I II I II II I II II II I II MM 
PCGIVFGTLLFRHXSTHCLYGKAAVGNAVAHEHPVADWNRNANAFALFDIGQFAGFIVQ 
70 80 90 100 110 120 

120 130 140 150 160 170 179 

HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKIFECFTG 
I : : I : I II II II II I I II I II I M M M I : : I : I : I : : : : 

HAINVKTVKI NIVDPHMFAN FAX FAVLEKRALTMAKSKXXXMRRRSQKSSRQKYLNVLRA 
130 140 150 160 170 180 

180 190 200 210 220 230 

orf 136 . pep AFVGTVYRFVCLFYI IN DGI AHH SAPQRVRYLFAPYCGFLPSASDSDLKSSXXSEX 

: II : I : : : : II I I II I I I I II II I II II I I II I M II Ml 

orf 136a R S PARFTGLS ACSTXXMTES PI IS APQRVRYLFAPYCGFLPS ASDSDLKS SKYSEX 

190 200 210 220 230 

The complete length ORF136a nucleotide sequence <SEQ ID 553> is: 

1 ATGATGAAGC GGCGTATAGC CGTCTTCGTC CTGCTCATGC AGAAAATCCG 

51 GATTTTGGGA CAACTGTTGC CGAAAATCGT CAATACAGTT CCGGCACATC 

101 GGATGCTCTT CCAGATNTTC GGGATGTTCT TTTTCTTCAT ACACCAGCAA 

151 TACCTGCCCG GGATCGCCGA AATCGATTCC CCATGCGGCA TCGTGTTCGG 

201 TACGCTCCTC TTCCGTCATC NGTCCACGCA TTGCCTGTAT GGTAAAGCCG 

251 CCGTAGGGAA TGCCGTTGCA CACGAACATC CAGTCGCTGA TGTCGTCAAC 



orf 136. pep 
orfl36a 



orf 136 -pep 
orfl36a 
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301 CGGAACGCAA ACGCTTTCGC 

351 CATTGTTCAG CACGCCATAA 

401 ATCCACATAT GTTCGCAAAT 

451 GCTTTGACCA TGGCAAAATC 

501 AAAAAGCTCG CGCCAAAAAT 

551 CACGGTTTAC CGGTTTGTCT 

601 CCCATCATAT CTGCTCCTCA 

651 CGGCTTTCTG CCTTCGGCAT 

701 CGGAATAG 

This encodes a protein having amino acic 



CTTGTTCGAC ATTGGTCAGT TCGCCGGGTT 
ATGTAAAGAC CGTCAAAATA AATATCGTCG 
TTCGCCNTCT TCGCCGTCTT GGAAAAAAGG 
TAAGGNGNNA NNGATGCGGC GGCGTTCCCA 
ATTTGAATGT TTTGCGGGCG CGTTCGCCGG 
GCCTGTTCTA CATAATAAAT GACGGAATCG 
ACGTGTACGG TATCTGTTTG CACCTTACTG 
CCGATTCGGA TTTGAAAAGT TCCAAATATT 



sequence <SEQ ID 554>: 



1 MMKRRIAVFV LLMQKIRILG QL LPKIVNTV PAHRMLFQXF GMFFFFIHQQ 

51 YLPGIAEIDS PCGIVFGTLL FRHXSTHCLY GKAAVGNAVA HEHPVADWN 

101 RNANAFALFD IGQFAGFIVQ HAINVKTVKI NIVDPHMFAN FAXFAVLEKR 

151 ALTMAKSKXX XMRRRSQKSS RQKYLNVLRA RSPARFTGLS ACST**MTES 

201 PIISAPQRVR YLFAPYCGFL PSASDSDLKS SKYSE* 

ORF136a and ORF136-1 show 73.1% identity in 238 aa overlap: 



10 20 30 40 50 60 

or f 13 6a . pep MMKRRIAVFVLLMQKIRILGQLLPKIVNTVPAHRMLFQXFGMFFFFIHQQYLPGIAEIDS 
tlMIIIIIII: I ! I : I I I I I I I I ! I 1 I I I I I 1 I 1 I I I I I I I I I I I I I I I I I I I I I I 
orf 136-1 MMKRRIAVFVLFPQIIRVLGQLLPKIVNTVPAHRMLFQIFGMFFFFIHQQYLPGIAEIDS 

10 20 30 40 50 60 



70 80 90 100 110 120 

or f 13 6a . pep PCGIVFGTLLFRHXSTHCLYGKAAVGNAVAHEHPVADWNRNANAFALFDIGQFAGFIVQ 
1 I 1 I I I I - I i 1 I I : : ! U I M I I I : I I I II I I I ! I < I I I ' II II I I M I I M I 1 I I I I 
orf 136-1 PCGIVFGALLFRHLPAHCLYGKAAVGDAVAHEHPVADVVNRNANAFALFDIGQFAGFIVQ 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 136a. pep HAINVKTVKINIVDPHMFANFAXFAVLEKRALTMAKSKXXXMRRRSQKSSRQKYLNVLRA 
I :: I : I I I I I I I I I II I I I I I I I I I I I I I : : I : |: I :: : : 

orf 136-1 HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKIFECFTG 

130 140 150 160 170 180 



190 200 210 220 230 

orf 136a. pep R SPARFTGLSACSTXXMTESPIISAPQRVRYLFAPYCGFLPSASDSDLKSSKYSEX 

: II: I : ::: I I I I I I I I I 1 I I M I I I I I I I I I I I I 1 1 I I I M 

orf 136-1 AFVGTVYRFVCLFYI INDGIAHH SAPQRVRYLFAPYCGFLPSASDSDLKSSKYSEX 

190 200 210 220 230 



Homology with a predicted ORF from N.zonorrhoeae 

ORF136 shows 92.3% identity over a 234aa overlap with a predicted ORF (ORF136ng) from 
N. gonorrhoeae: 

orf 136 .pep MKRR I AVFVLFPQI IRVLGQLLPKI VNTVPAHRML FQ I FGMFFFFI HQQYL PG I AE I DS 59 

MINIUM: | I t : I ! II M I M I I I 1 M I I I I I I I M I M f 1 I : I M 1 I I t I I t I 
orfl36ng MMKRRIAVFVLLMQKIRILGQLLPKIVNTVPAHRMLFQI FGMFFFFI HRQ YLPGIAEIDS 60 



orf 136 .pep PCGIVFGALLFRHLPAHCLYGKAAVGDAVAHEHPVADWNRNANAFALFDIGQFAXFIVQ 119 

I 11 I II : I 1 I I J t I II II M I I I I II II I M II I II : II I II M II I M II I till 

orfl36ng PGGIVFGTLLFRHLSAHCLYGKAAVGDAVAHEHPVADVANRNANAFALFDIGQSAGFIVQ 120 

orf 136 .pep HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKIFECFTG 179 

M I II I I I I I II I I II I I I M I I 1 1 II II I M II I I I N II I I 1 II N I I N I: 1 1 II II 

orfl36ng HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPBCVFECFTG 180 



orf 136 . pep AFVGTVYRFVCLFYIINDGIAHHSAPQRVRYLFAPYCGFLPSASDSDLKSSXXSE 234 

I I : I I I M I M I M II ! M I I I I : M I t M ! M E M NN I I I I 1 I t 1 I M 
orfl36ng AFAGTVYRFVCLFY I IN DG IAHHTAPQRVRYLFAPYRGFLPPAS DS DLKS SKYSE 235 

The complete length ORF136ng nucleotide sequence <SEQ ID 555> is: 



1 ATGATGAAGC GGCGTATAGC CGTCTTCGTC CTGCTCATGC AGAAAATCCG 
51 GATTTTGGGA CAACTGTTGC CGAAAATCGT CAATACAGTT CCGGCACATC 
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101 GGATGCTCTT CCAAATTTTC GGGATGTTCT TTTTCTTCAT ACACCGGCAA 

151 TACCTGCCCG GGATCGCCGA AATCGATTCC CCAGGCGGTA TCGTGTTCGG 

201 TACGCTCCTC TTCCGTCATC TGTCCGCGCA TTGCCTGTAC GGTAAAGCCG 

251 CCGTAGGGGA TGCCGTTGCA CACGAACATC CAGTCGCTGA TGTCGCCAAC 

301 CGGAACGCAA ACGCTTTCGC CTTGTTCGAC ATTGGTCAGT CCGCCGGGTT 

351 CATTGTTCAG CACACCGTAA ATATAAAGAC CGTCAAAATA AATATCGTCG 

401 ATCCACATAT GTTCGCAAAT TTCGCCGTCT TCGCCGTCTT GGAAAAAAGG 

451 GACTTTGACC ATGGCAAAAT CCAAGGCGGA AATAATGCGG CGGCGTTCCC 

501 AAAAAAGCTC GCGCCAAAAG TATTTGAATG TTTTACGGGC GCGTTCGCCG 

551 GCACGGTTTA CCGGTTCGTC TGCCTGTTCT ACATAATAAA TGACGGAATC 

601 GCCCATCATA CTGCTCCTCA ACGTGTACGG TATCTGTTTG CACCTTACCG 

651 CGGTTTTCTA CCTCCGGCAT CCGATTCGGA TTTGAAAAGT TCCAAATATT 

701 CGGAATAG 

This encodes a protein having amino acid sequence <SEQ ID 556>: 

1 MMKR RIAVFV LLMQKIRILG QL LPKIVNTV PAHRMLFQIF GMFFFFIHRQ 

51 YLPGIAEIDS PGGIVFGTLL FRHLSAHCLY GKAAVGDAVA HEHPVADVAN 

101 RNANAFALFD IGQSAGFIVQ HTVNIKTVKI NIVDPHMFAN FAVFAVLEKR 

151 DFDHGKIQGG NNAAAFPKKL APKVFECFT G AFAGTVYRFV CLFYII NDGI 

201 AHHTAPQRVR YLFAPYRGFL PPASDSDLKS SKYSE* 

ORF136ng and ORF136-1 show 93.6% identity in 235 aa overlap: 

orf!36ng MMKRRIAVFVLLMQKIRILGQLLPKIVNTVPAHRMLFQIFGMFFFFIHRQYLPGIAEIDS 
I I II I II I I II : I I I : I I I I I I I I I II I I I I II I I I I I I I I I I I I I : I I |!| I I I I I I I 
orf 136-1 MMKRRI AVFVLFPQI IRVLGQLLPKIVNTVPAHRMLFQI FGMFFFFIHQQYLPGIAE I DS 

orfl36ng PGG I VFGTLLFRHLSAHCLYGKAAVGDAVAHEH PVADVANRNANAFALFD IGQSAGFIVQ 

I llllhllllll II I I II I I II I I I I I I I I I I I I I : I I I I I I I I I II I I I I I I I I I 
orf 136-1 PCG I VFGALLFRHL PAHCLYGKAAVGDAVAHEH PVADWNRNANAFALFDIGQFAGFI VQ I 

orfl36ng HTVNIKTVKINIVDPHMF7VNFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKVFECFTG i 

MMIMIM1NII MINI I II I MIMII IN II Nil I II II I I I I I I I : I I I I I I 
orf 13 6-1 HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKIFECFTG 

orfl36ng AFAGTVYRFVCLFYIINDGIAHHTAPQRVRYLFAPYRGFLPPASDSDLKSSKYSEX 
M:lllfMIIIIIIIIIIIIM:|||||IMMII MM I I I I I I I I I I I I 1 1 
orf 136-1 AFVGTVYRFVCLFYIINDGIAHHSAPQRVRYLFAPYCGFLPSASDSDLKSSKYSEX 

Based on the presence of the putative transmembrane domains in the gonococcal protein, it is 
predicted that the proteins from Kmeningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 67 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 557>: 

1 ATGGAAAATA TGGTAACGTT TTCAAAAATC AGACCGCTTT TGGCAATCGC 

51 CGCCGCCGCG TTGCTTGCCG CC.TGCGGAC GGCGGGAAAT AATGCTGTCC 

101 GCAAGCCGGT GCAAACCGCC AAACCCGCCG CAGTGGTCGG TTTGGCACTC 

151 GGTGGCGGCG CATCTAAAGG ATTTGCCCAT GTAGGTATTA TTAAGGTTTT 

201 GAAAGAAAAC GGTATTCCTG TGAAGGTGGT TACCGGCACC TCCGCAGGTT 

251 CGATTGTCGG CAACCTTTTT GCATCGGGTA TGTCGCCCGA CCGCCTCGAA 

301 TTGGAAGCCG AAATTTTAGG CAAAACCGAT TTGGTCGATT TAACCTTGTC 

351 CACCAATGGG TTTATCAAAG GCGCAAAGCT GCAAAATTAC ATCAACCGAA 

401 AACTCCGCGG CATGCAGATT CAGCAGTTTC CCATCAAATT TGCCGCC 

This corresponds to the amino acid sequence <SEQ ID 558; ORF137>: 

1 MENMVTFSKI RPLLAIAAAA LLAAXRTAGN NAVRKPVQTA KPAAWGLAL 
51 GGGASKGFAH VGIIKVLKEN GIPVKWTGT SAGSIVGNLF ASGMSPDRLE 
101 LEAEILGKTD LVDLTLSTNG FIKGAKLQNY INRKLRGMQI QQFPIKFAA. . 

Further work revealed the complete nucleotide sequence <SEQ ID 559>: 
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1 ATGGAAAATA TGGTAACGTT TTCAAAAATC AGACCGCTTT TGGCAATCGC 

51 CGCCGCCGCG TTGCTTGCCG CCTGCGGCAC GGCGGGAAAT AATGCTGTCC 

101 GCAAGCCGGT GCAAACCGCC AAACCCGCCG CAGTGGTCGG TTTGGCACTC 

151 GGTGGCGGCG CATCTAAAGG ATTTGCCCAT GTAGGTATTA TTAAGGTTTT 

201 GAAAGAAAAC GGTATTCCTG TGAAGGTGGT TACCGGCACA TCGGCAGGTT 

251 CGATTGTCGG CAGCCTTTTT GCATCGGGTA TGTCGCCCGA CCGCCTCGAA 

301 TTGGAAGCCG AAATTTTAGG CAAAACCGAT TTGGTCGATT TAACCTTGTC 

351 CACCAGTGGT TTTATCAAAG GCGAAAAGCT GCAAAATTAC ATCAACCGAA 

401 AAGTCGGCGG CAGGCAGATT CAGCAGTTTC CCATCAAATT TGCCGCCGTT 

451 GCTACTGATT TTGAAACCGG CAAGGCCGTC GCTTTCAATC AGGGGAATGC 

501 CGGGCAGGCT GTGCGCGCTT CCGCCGCCAT TCCCAATGTG TTCCAACCCG 

551 TTATCATCGG CAGGCATACA TATGTTGACG GCGGTCTGTC GCAGCCCGTG 

601 CCCGTCAGTG CCGCCCGGCG GCAGGGGGCG AATTTCGTGA TTGCCGTCGA 

651 TATTTCCGCC CGTCCGGGCA AAAACATCAG CCAAGGTTTC TTCTCTTATC 

701 TCGATCAGAC GCTGAACGTA ATGAGCGTTT CTGCGTTGCA AAATGAGTTG 

751 GGGCAGGCGG ATGTGGTTAT CAAACCGCAG GTTTTGGATT TGGGTGCAGT 

801 CGGCGGATTC GATCAGAAAA AACGCGCCAT CCGGTTGGGT GAGGAGGCAG 

851 CACGTGCCGC ATTGCCTGAA ATCAAACGCA AACTGGCGGC ATACCGTTAT 

901 TGA 

This corresponds to the amino acid sequence <SEQ ID 560; ORF137-l>: 

l M ENMVTFSKI RPLLAIAAAA LLAA CGTAGN NAVRKPVQTA KPAAWGLAL 
51 GGGASKGFAH VGIIKVLKEN GIPVKWTGT SAGSIVGSLF ASGMSPDRLE 
101 LEAEILGKTD LVDLTLSTSG FIKGEKLQNY INRKVGGRQI QQFPIKFAAV 
151 ATDFETGKAV AFNQGNAGQA VRASAAIPNV FQPVIIGRHT YVDGGLSQPV 
201 PVSAARRQGA NFVIAVDISA RPGKNISQGF FSYLDQTLNV MSVSALQNEL 
251 GQADWIKPQ VLDLGAVGGF DQKKRAIRLG EEAARAALPE IKRKLAAYRY 
301 * 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from Kmenin ^itidis (strain A) 

ORF137 shows 93.3% identity over a 149aa overlap with an ORF (ORF137a) from strain A of K 
meningitidis: 



or f 137. pep 
orfl37a 



10 20 30 40 50 60 

MENMVTFSKIRPLIAIAAAALLAAXRTAGNNAVRKPVQTAKPAAWGLALGGGASKGFAH 
1 I 1 1 1 1 1 1 1 1 I I I I 1 1 1 1 1 1 1 I I I MlllhllllllimilllMIMMIMIM 
MENMVTFSK IRPLLAI AAAALLAACGTAGNN AARKPVQTAKPAAWGLALGGGAS KG FAH 

10 20 30 40 50 60 



70 80 90 100 110 120 

VGIIKVLKENGIPVKWTGTSAGSIVGNLFASGMSPDRLELEAEILGKTDLVDLTLSTNG 
I, UN Ml Mil MMIMIM I 111:11 MINI I M I IUMIIMM IMI I 11:1 
n-rf!37a VGIIKVX.KENGIPVKWTGTSAGSIVGSLFASGMSPDRLELEAEILGKTDLVDLTLSTSG 
° rt 70 80 90 100 110 120 



or f 137. pep 



130 140 149 

FI KGAKLQNY INRKLRGMQI QQFPIKFAA 
| | | | | | | || I | I I : I : I I I I II I I I I 

FIKGEKLQNYINRKVGGRRIQQFPIKFAAVATDFETGKAVAFNQGNAGQAVRASAAIPNV 
130 140 150 160 170 180 

The complete length ORF137a nucleotide sequence <SEQ ID 561> is: 

1 ATGGAAAATA TGGTAACGTT TTCAAAAATC AGACCGCTTT TGGCAATCGC 

51 CGCCGCCGCG TTGCTTGCCG CCTGCGGCAC GGCGGGAAAT AATGCTGCCC 

101 GCAAGCCGGT GCAAACCGCC AAACCCGCCG CAGTGGTCGG TTTGGCACTC 

151 GGTGGCGGCG CATCTAAAGG ATTTGCCCAT GTAGGTATTA TTAAGGTTTT 

201 GAAAGAAAAC GGTATTCCTG TGAAGGTGGT TACCGGCACA TCGGCAGGTT 

251 CGATAGTCGG CAGCCTTTTT GCATCGGGTA TGTCGCCCGA CCGCCTCGAA 

301 TTGGAAGCCG AAATTTTAGG TAAAACCGAT TTGGTCGATT TAACCTTGTC 

351 CACCAGTGGT TTTATCAAAG GCGAAAAGCT GCAAAATTAC ATCAACCGAA 

401 AAGTCGGCGG CAGGCGGATT CAGCAGTTTC CCATCAAATT TGCCGCCGTT 

451 GCTACTGATT TTGAAACCGG CAAGGCCGTC GCTTTCAATC AAGGGAATGC 

501 CGGGCAGGCT GTGCGCGCTT CCGCCGCCAT TCCCAATGTG TTCCAACCCG 

551 TTATCATCGG CAGGCATACA TATGTTGACG GCGGTCTGTC GCAGCCCGTG 



or f 137. pep 
orfl37a 
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601 CCCGTCAGTG CCGCCCGGCG GCANGNNNNG NATNTCGTGA TTGCCGTCGA 

651 TATTTCCGCC CGTCCGAGCA AAAACATCAG CCAAGGCTTC TTCTCTTATC 

701 TCGATCAGAC GCTGAACGTA ATGAGCGTTT CCGCGTTGCA AAATGAGTTG 

751 GGGCAGGCGG ATGTGGTTAT CAAACCGCAG GTTTTGGATT TGGGTGCAGT 

801 CGGCGGATTC GATCAGAAAA AACGCGCCAT CCGGTTGGGT GAGGAGGCAG 

851 CACGTGCCGC ATTGCCTGAA ATCAAACGCA AACTGGCGGC ATACCGTTAT 

901 TGA 

This encodes a protein having amino acid sequence <SEQ ID 562>: 

1 MENMVTFSKI RPLLAIAAAA LLAA CGTAGN NAARKPVQTA KPAAWGLAL 

51 GGGASKGFAH VGIIKVLKEN GIPVKWTGT SAGSIVGSLF ASGMSPDRLE 

101 LEAEILGKTD LVDLTLSTSG FIKGEKLQNY INRKVGGRRI QQFPIKFAAV 

151 ATDFETGKAV AFNQGNAGQA VRASAAIPNV FQPVIIGRHT YVDGGLSQPV 

201 PVSAARRXXX XXVIAVDISA RPSKNISQGF FSYLDQTLNV MSVSALQNEL 

251 GQADWIKPQ VLDLGAVGGF DQKKRAIRLG EEAARAALPE IKRKLAAYRY 

301 * 

ORF137a and ORF137-1 show 97.3% identity in 300 aa overlap: 

or f 1 37a . pep MENMVTFSKIRPLLAIAAAALLAACGTAGNNAARKPVQTAKPAAWGLALGGGASKGFAH 
t I I I I I I I I I I 1 I I I I ! I I I I I I I I I I i II t I : I I I I I I I 11 I I II I t I I I II I I I I M I 
orf 137-1 MENMVTFSKIRPLLAIAAAALLAACGTAGNNAVRKPVQT/^PAAWGLALGGGASKGFAH 

orf 137a. pep VGIIKVLKENGIPVKWTGTSAGSIVGSLFASGMSPDRLELEAEILGKTDLVDLTLSTSG 
1 1 I 1 1 1 1 1 ! I II 1 1 I I I 1 1 I I I I 1 1 I I I 1 I I I 1 1 1 1 1 I M I 1 1 I I I I II II l!l I M I I I I 
orf 137-1 VGIIKVLKENGIPVKWTGTSAGSIVGSLFASGMSPDRLELEAEILGKTDLVDLTLSTSG 

orf 137a . pep FIKGEKLQNYINRKVGGRRIQQFPIKFAAVATDFETGKAVAFNQGNAGQAVRASAAIPNV 
I I I I I I I II 1 I I I I I I I I : I I I I I II I I I I I I I I I I I I I II II I I I I I I I I I I I I I I II I 
orf 137-1 FIKGEKLQNYINRKVGGRQIQQFPIKFAAVATDFETGKAVAFNQGNAGQAVRASAAIPNV | 

orf 137a. pep FQPVIIGRHTYVDGGLSQPVPVSAARRXXXXXVIAVDISARPSKNISQGFFSYLDQTLNV | 

I I M I I I I I I I I I I I I I I I I I I I I I I i I I M I I I I I I : I I I I I I I I I I I I I II M 

orf 137-1 FQPVIIGRHTWDGGLSQPVPVSAARRQGANFVIAVDISARPGKNISQGFFSYLDQTLNV 

orf 137a . pep MSVSALQNELGQADWIKPQVLDLGAVGGFDQKKRAIRLGEEAARAALPE IKRKLAAYRY 

1 1 1 1 1 1 1 1 1 1! 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 it 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ii 1 1 1 1 1 1 ii i ii 

orf 137-1 MSVSALQNELGQADWIKPQVLDLGAVGGFDQKKRAIRLGEEAARAALPEIKRKLAAYRY 

Homology with a predicted ORF from ^gonorrhoeae 

ORF137 shows 89.9% identity over a 149aa overlap with a predicted ORF (ORF137ng) from 
N. gonorrhoeae: 

orf 137. pep MENMVT FS K I RPLLAI AAAALLAAX RT AGNN AVRK PVQTAK PAAWGLALGGG AS KG FAH 60 

I I 1 1 M II I I I : I I I I I I I I I I I I I U I I : I I I I I I I I I I I I I : I I I I I I I I I I I I I 
orfl37ng MENMVT FS KIRS FLAI AAAALLAACGTAGNNAARKPVQTAKPAAWALALGGGASKG FAH 60 

orf 137 . pep VGIIKVLKENGIPVKVVTGTSAGSIVGNLFASGMSPDRLELEAEILGKTDLVDLTLSTNG 120. 

: I I: II I I I I II I II I I I I I I I I I I I I : I : II I I I I I I I I II I I II I II I I I I I II I I : I 
orfl37ng IGIVKVLKENGIPVKVVTGTSAGSIVGSLLASGMSPDRLELEAEILGKTDLVDLTLSTSG 120 

orf 137. pep FIKGAKLQNYINRKLRGMQIQQFPIKFAA 149 

I I I I I I I I I I I I I : I I I I I I I I I I I I 
orfl37ng FIKGEKLQNY I NRKVGGRQIQQFPI KFAAVAT DFETGKAVAFNQGNAGQAVRAS AAI PNV 180 

The complete length ORF137ng nucleotide sequence <SEQ ID 563> is: 

1 ATGGAAAATA TGGTAACGTT TTCAAAAATC AGATCATTTT TGGCAATCGC 

51 CGCCGCCGCG TTGCTTGCCG CCTGCGGTAC GGCGGGAAAC AATGCCGCCC 

101 GCAAGCCGGT GCAAACCGCC AAACCCGCCG CAGTGGTCGC TTTGGCACTC 

151 GGTGGCGGCG CATCTAAAGG ATTTGCCCAT ATAGGAATTG TTAAGGTTTT 

201 GAAAGAAAAC GGTATTCCTG TGAAGGTGGT TACCGGCACA TCGGCAGGTT 

251 CGATAGTCGG CAGCCTTTTG GCATCGGGTA TGTCGCCCGA CCGCCTCGAA 

301 TTGGAAGCCG AGATTTTAGG TAAAACCGAT TTAGTCGATT TAACCTTGTC 

351 CACCAGTGGT TTTATCAAAG GCGAAAAGCT GCAAAATTAC ATCAACCGAA 

401 AAGTCGGCGG CAGGCAGATT CAGCAGTTTC CCATCAAATT TGCCGCCGTT 

451 GCCACTGATT TTGAAACCGG CAAGGCCGTC GCTTTCAATC AAGGGAATGC 
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501 CGGGCAGGCG GTTCGTGCTT CCGCCGCCAT TCCCAATGTG TTCCAGCCAG 

551 TCATCATCGG CAGGCACAAA TATGTTGACG GCGGTCTGTC GCAGCCCGTG 

601 CCCGTCAGTG CCGCTCGGCG GCAGGGGGCG AATTTCGTGA TTGCCGTCGA 

651 TATTTCCGCA CGTCCGAGCA AAAATGTCGG TCAAGGTTTC TTCTCTTATC 

701 TCGATCAGAC GCTGAACGTG ATGAGCGTTT CCGTGTTGCA AAACGAGTTG 

751 gggcAGGCGG ATGTGGTTAT CAAACCGCag gtTTTGGATT TGGGTGCAGT 

801 CGGCGGATTC GATCAGAAAA AGCGCGCCAT CCGGTTGGGC GAGGAGGCAG 

851 CACGTGCCGC ATTGCCTGAA ATCAAACGCA AACTGGCGGC ATACCGTTAT 

901 TGA 

This encodes a protein having amino acid sequence <SEQ ID 564>: 



1 MENMVTFS KI RSFLAIAAAA LLAACG TAGN NAARKPVQTA KPAAWALAL 

51 GGGASKGFAH IGIVKVLKEN GIPVKWTGT SAGSIVGSLL ASGMSPDRLE 

101 LEAEILGKTD LVDLTLSTSG FIKGEKLQNY INRKVGGRQI QQFPIKFAAV 

151 ATDFETGKAV AFNQGNAGQA VRASAAIPNV FQPVIIGRHK YVDGGLSQPV 

201 PVSAARRQGA NFVIAVDISA RPSKNVGQGF FSYLDQTLNV MSVSVLQNEL 

251 GQADWIKPQ VLDLGAVGGF DQKKRAIRLG EEAARAALPE IKRKLAAYRY 

301 * 

ORF137ng and ORF137-1 show 96.0% identity in 300 aa overlap: 



orfl37ng MENMVTFSKIRSFLAIAAAALLAACGTAGNNAARKPVQTAKPAAWALALGGGASKGFAH 
I 1 I I I 1 I I 1 I I : I I t I 1 I I I I I I I I I II M I : I II I ! I I I I II ! I : II I I I I I I I II II 
or f 1 3 7 - 1 MENMVTFSKIRPLLAI AAAALLAACGTAGNNAVRKPVQTAKPAAWGLALGGGASKGFAH 

orfl37ng IGIVKVLKENGIPVKVVTGTSAGSIVGSLLASGMSPDRLELEAEILGKTDLVDLTLSTSG 
:M:III I MM MINI I IMII I III 1:1 I MIIMI IIMIII IMMI Mil MM 
orf 137-1 VGIIKVLKENGIPVKVVTGTSAGSIVGSLFASGMSPDRLELEAEILGKTDLVDLTLSTSG 



orfl37ng FIKGEKLQNY INRKVGGRQIQQFPIKFAAVAT DFETGKAVAFNQGNAGQAVRASAAI PNV 

MIMMMMMMMMMMMMMII IMIIIIMIMMIIMMIIIII MM 
or f 1 3 7 - 1 FIKGEKLQN YINRKVGGRQIQQFP IKFAAVAT DFETGKAVAFNQGNAGQAVRASAAI PNV 

o r f 1 3 7 ng FQPV I IGRHKYVDGGLSQPVPVSAARRQGAN FVI AVDI SARPSKNVGQGFFS YLDQTLNV 

M M I I M I llllllllllllllllllllllllllllllll:||::||||lllllllll 
orf 137-1 FQPVIIGRHTYVDGGLSQPVPVSAARRQGAN FVI AVDI SARPGKNISQGFFS YLDQTLNV 

orfl37ng MSVSVLQNELGQADWIKPQVLDLGAVGGFDQKKRAIRLGEEAARAALPEIKRKLAAYRY 
I I I h I I I I I I I I I I I I I I I I I I I I I I I I 1 I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 
or f 1 37 MSVS ALQNELGQAD W IKPQVLDLGAVGG FDQKKRAIRLGEEAARAALPE IKRKLAAYRY 

Based on the presence of a predicted prokaryotic membrane lipoprotein lipid attachment site 
(underlined) in the gonococcal protein, it is predicted that the proteins from N.meningitidis and 
N.gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



Example 68 



The following partial DNA sequence was identified in N.meningitidis <SEQ ID 565>: 

1 ATGTTTCGTT TACAATTCAG GCTGTTTCCC CCTTTGCGAA CCGCCATGCA 

51 CATCCTGTTG ACCGCCCTGC TCAAATGCCT CTCCCTGcTG CCGCTTTCCT 

101 GTCTGCACAC GCTGGGAAAC CGGCTCGGAC ATCTGGCGTT TTACCTTTTA 

151 AAGGAAGACC GCGCGCGCAT CGTCGCCmAT ATGCGGCAGG CGGGTTTGAA 

201 CCCCGACCCC AAAACGGTCA AAGCCGTTTT TGCGGAAACG GCAAAAGGCG 

251 GTTTGGAACT TGCCCCCGCG TTTTTCAGAA AACCGGAAGA CATAGAAACA 

301 ATGTTCAAAG CGGTACACGG CTGGGAACAT GTGCAGCAGG CTTTGGACAA 

351 ACACGAAGGG CTGCTATTC. . 

This corresponds to the amino acid sequence <SEQ ID 566; ORF138>: 



1 MFRLQFRLFP PLRTAMHILL TALLKCLSLL PLSCLHTLGN RLGHLAFYLL 
51 KEDRARIVAX MRQAGLNPDP KTVKAVFAET AKGGLELAPA FFRKPEDIET 
101 MFKAVHGWEH VQQALDKHEG LLF 
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Further work revealed the complete nucleotide sequence <SEQ ID 567>: 

1 ATGTTTCGTT TACAATTCAG GCTGTTTCCC CCTTTGCGAA CCGCCATGCA 

51 CATCCTGTTG ACCGCCCTGC TCAAATGCCT CTCCCTGCTG CCGCTTTCCT 

101 GTCTGCACAC GCTGGGAAAC CGGCTCGGAC ATCTGGCGTT TTACCTTTTA 

151 AAGGAAGACC GCGCGCGCAT CGTCGCCAAT ATGCGGCAGG CGGGTTTGAA 

201 CCCCGACCCC AAAACGGTCA AAGCCGTTTT TGCGGAAACG GCAAAAGGCG 

251 GTTTGGAACT TGCCCCCGCG TTTTTCAGAA AACCGGAAGA CATAGAAACA 

301 ATGTTCAAAG CGGTACACGG CTGGGAACAT GTGCAGCAGG CTTTGGACAA 

351 ACACGAAGGG CTGCTATTCA TCACGCCGCA CATCGGCAGC TACGATTTGG 

401 GCGGACGCTA CATCAGCCAG CAGCTTCCGT TCCCGCTGAC CGCCATGTAC 

451 AAACCGCCGA AAATCAAAGC GATAGACAAA ATCATGCAGG CGGGCAGGGT 

501 TCGCGGCAAA GGAAAAACCG CGCCTACCAG CATACAAGGG GTCAAACAAA 

551 TCATCAAAGC CCTGCGTTCG GGCGAAGCAA CCATCGTCCT GCCCGACCAC 

601 GTCCCCTCCC CTCAAGAAGG CGGGGAAGGC GTATGGGTGG ATTTCTTCGG 

651 CAAACCTGCC TATACCATGA CGCTGGCGGC AAAATTGGCA CACGTCAAAG 

701 GCGTGAAAAC CCTGTTTTTC TGCTGCGAAC GCCTGCCTGG CGGACAAGGT 

751 TTCGATTTGC ACATCCGCCC CGTCCAAGGG GAATTGAACG GCGACAAAGC 

801 CCATGATGCC GCCGTGTTCA ACCGCAATGC CGAATATTGG ATACGCCGTT 

851 TTCCGACGCA GTATCTGTTT ATGTACAACC GCTACAAAAT GCCGTAA 

This corresponds to the amino acid sequence <SEQ ID 568; ORF138-l>: 

1 MFRLQFRLFP PLRTAMH ILL TALLKCLSLL PLSC LHTLGN RLGHLAFYLL 

51 KEDRARIVAN MRQAGLNPDP KTVKAVFAET AKGGLELAPA FFRKPEDIET 

101 MFKAVHGWEH VQQALDKHEG LLFITPHIGS YDLGGRYISQ QLPFPLTAMY 

151 KPPKIKAIDK IMQAGRVRGK GKTAPTSIQG VKQIIKALRS GEATIVLPDH 

201 VPSPQEGGEG VWVDFFGKPA YTMTLAAKLA HVKGVKTLFF CCERLPGGQG 

251 FDLHIRPVQG ELNGDKAHDA AVFNRNAEYW IRRFPTQYLF MYNRYKMP* 

Computer analysis of this amino acid sequence gave the following results: j 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF138 shows 99.2% identity over a 123aa overlap with an ORF (ORF138a) from strain A of N. 
meningitidis: 



10 20 30 40 50 60 

orf 138 . pep MFRLQFRL FP PLRTAMH I LLT ALLKCLS LLPLSCLHT LGNRLGHLAFYLLKE DRAR I VAX 

I III MM II II MUM ill Mill III I1MII! I! II IIIIMII I I M I III II I 
orf 138a MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAN 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 138 . pep MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 
M I II 1 1 I I M I M I I M I I 1 1 I i ! I I M 1 1 I I M M M I I II I I I I i I I I I I I I I M I I 
orf 138a MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 

70 80 90 100 110 120 



orf 138. pep LLF 
IM 

orf 138a LLFITPHIGSYDLGGRYISQQLPFPLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTSIQG 
130 140 150 160 170 180 

The complete length ORF138a nucleotide sequence <SEQ ID 569> is: 



1 ATGTTTCGTT TACAATTCAG GCTGTTTCCC CCTTTGCGAA CCGCCATGCA 

51 CATCCTGTTG ACCGCCCTGC TCAAATGCCT CTCCCTGCTG CCGCTTTCCT 

101 GTCTGCACAC GCTGGGAAAC CGGCTCGGAC ATCTGGCGTT TTACCTTTTA 

151 AAGGAAGACC GCGCGCGCAT CGTCGCCAAT ATGCGTCAGG CAGGCATGAA 

201 TCCCGACCCC AAAACGGTCA AAGCCGTTTT TGCGGAAACG GCAAAAGGCG 

251 GTTTGGAACT TGCCCCCGCG TTTTTCAGAA AACCGGAAGA CATAGAAACA 

301 ATGTTCAAAG CGGTACACGG CTGGGAACAT GTGCAGCAGG CTTTGGACAA 

351 ACACGAAGGG CTGCTATTCA TCACGCCGCA CATCGGCAGC TACGATTTGG 

401 GCGGACGCTA CATCAGCCAG CAGCTTCCGT TCCCGCTGAC CGCCATGTAC 

451 AAACCGCCGA AAATCAAAGC GATAGACAAA ATCATGCAGG CGGGCAGGGT 

501 TCGCGGCAAA GGAAAAACCG CGCCTACCAG CATACAAGGG GTCAAACAAA 

551 TCATCAAAGC CCTGCGTTCG GGCGAAGCAA CCATCGTCCT GCCCGACCAC 
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601 GTCCCCTCCC CTCAAGAAGG CGGGGAAGGC GTATGGGTGG ATTTCTTCGG 

651 CAAACCTGCC TATACCATGA CGCTGGCGGC AAAATTGGCA CACGTCAAAG 

701 GCGTGAAAAC CCTGTTTTTC TGCTGCGAAC GCCTGCCTGG CGGACAAGGT 

751 TTCGATTTGC ACATCCGCCC CGTCCAAGGG GAATTGAACG GCGACAAAGC 

801 CCATGATGCC GCCGTGTTCA ACCGCAATGC CGAATATTGG ATACGCCGTT 

851 TTCCGACGCA GTATCTGTTT ATGTACAACC GCTACAAAAT GCCGTAA 

This encodes a protein having amino acid sequence <SEQ ID 570>: 



1 MFRLQFRLFP PLRTAMH ILL TALLKCLSLL PLS CLHTLGN RLGHLAFYLL 

51 KEDRARIVAN MRQAGLNPDP KTVKAVFAET AKGGLELAPA FFRKPEDIET 

101 MFKAVHGWEH VQQALDKHEG LLFITPHIGS YDLGGRYISQ QLPFPLTAMY 

151 KPPKIKAIDK IMQAGRVRGK GKTAPTSIQG VKQIIKALRS GEATIVLPDH 

201 VPSPQEGGEG VWVDFFGKPA YTMTLAAKLA HVKGVKTLFF CCERLPGGQG 

251 FDLHIRPVQG ELNGDKAHDA AVFNRNAEYW IRRFPTQYLF MYNRYKMP* 

ORF138a and ORF138-1 show 99.7% identity over a 298aa overlap: 



orf 138a . pep MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAN 
I I I I I I I I I ! I I I I ! I I I I I I I I II I I I I I I! I I I I I I I I I I I I I I II I I I I I I I I I I I I 
orf 138-1 MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAN 

orf 138a . pep MRQAGMNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 
I I I I I : I I I I I I II I I I I I I I I I II I I I I I I I I II I I I I I I I I I I I II II M I I I I I I I I 
orf 138-1 MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 

or f 138a . pep LLFITPHIGSYDLGGRYISQQLPFPLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTSIQG 
I I | I I I I I I I I I II I I I I 1 I I I I I I I I I I I I I I I II I I I I I M I I I I I I I M I I I I I II I 
orf 138-1 LLFITPHIGSYDLGGRYISQQLPFPLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTSIQG 

orf 138a . pep VKQIIKALRSGEATIVLPDHVPSPQEGGEGVWVDFFGKPAYTMTLAAKLAHVKGVKTLFF 
I I I I I I I i 1 ! I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I M i I I I I I II I I I I i t I I I 
orf 138-1 VKQI IBCALRSGEATIVLPDHVPS PQEGGEGVWVDFFGKPAYTMTLAAKLAHVKGVKTLFF 

orJ138a . pep CCERLPGGQGFDLHIRPVQGELNGDKAHDAAVFNRNAEYWIRRFPTQYLFMYNRYKMP 
I | | | | I I I I 1 I I I I I I I I I I I I I I I I I I II I II I II II I II I I I I I 1 II I I I II II I I 
or f 1 3 8 - 1 CCERLPGGQGFDLHIRPVQGELNGDKAHDAAVFNRNAEYWIRRFPTQYLFMYNRYKMP 



Homology with a predicted ORF from N. gonorrhoeae 

ORF138 shows 94.3% identity over a 123aa overlap with a predicted ORF (ORF138ng) from 
Kgononhoeae: 

orf 138 . pep MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAX 60 

I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I M I I I I I I I 
orfl38ng MFRLQFRLFP PLRTAMHILLTALLKCLSLLSLSCLHTLGNRLGHLAFYLLKEDRARI VAN 60 

orf 138 . pep MRQAGLN PDPKTVKAVFAETAKGGLELAPAFFRKPEDI ETMFKAVHGWEHVQQALDKHEG 120 

MIIIMII r 1 | | f | | ! | } ! | 1 K I 1 1 I 1 1 I r | 1 | I | I I 1 1 1 I I I | I I 1 I I I I I I I II 
orfl38ng MRQAGLN PDTQTVKAVFAETAKCGLELAPAFFKKPEDI ETMFKAVHGWEH VQQALDKGEG 120 

orf 138. pep LLF 123 
III 

orfl38ng LLFITPHIGSYDLGGRYI SQQLPFHLTAMYKPPKIKAI DKIMQAGRVRGKGKTAPTGIQG 180 

The complete length ORF138ng nucleotide sequence <SEQ ID 571 > is: 

1 ATGTTTCGTT TACAATTCAG GCTGTTTCCC CCTTTGCGAA CCGCCATGCA 

51 CATCCTGTTG ACCGCCCTGC TCAAATGCCT CTCCCTGCTG TCGCTTTCCT 

101 GTCTGCACAC GCTGGGAAAC CGGCTCGGAC ATCTGGCGTT TTACCTTTTA 

151 AAGGAAGACC GCGCGCGCAT CGTCGCCAAT ATGCGGCAGG CGGGTTTGAA 

201 CCCCGACACG CAGACGGTCA AAGCCGTTTT TGCGGAAACG GCAAAATGCG 

251 GTTTGGAACT TGCCCCCGCG TTTTTCAAAA AACCGGAAGA CATCGAAACA 

301 ATGTTCAAAG CGGTACACGG CTGGGAACAC GTGCAGCAGG CTTTGGACAA 

351 GGGCGAAGGG CTGCTGTTCA TCACGCCGCA CATCGGCAGC TACGATTTGG 

401 GCGGACGCTA CATCAGCCAG CAGCTTCCGT TCCACCTGAC CGCCATGTAC 

451 AAGCCGCCGA AAATCAAAGC GATAGACAAA ATCATGCAGG CGGGCAGGGT 

501 GCGCGGCAAA GGCAAAACcg cgcccaccgg catACAAGGG GTCAAACAAA 

551 tcatcaAGGC CCTGCGCGCG GGCGAGGCAA CCAtcATCCT GCCCGACCAC 
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601 GTCCCTTCTC CGCAGGAagg cggCGGCGTG TGGGCGGATT TTTTCGGCAA 

651 ACCTGCATAc acCATGACAC TGGCGGCAAA ATTGGCACAC GTCAAAGGCG 

701 TGAAAACCCT GTTTTTCTGC TGCGAACGCC TGCCCGACGG ACAAGGCTTC 

751 GTGTTGCACA TCCGCCCCGT CCAAGGGGAA TTGAACGGCA ACAAAGCCCA 

5 801 CGATGCCGCC GTGTTCAACC GCAATACCGA ATATTGGATA CGCCGTTTTC 

851 CGACGCAGTA TCTGTTTATG TACAACCGCT ATAAAACGCC GTAA 

This encodes a protein having amino acid sequence <SEQ ID 572>: 

1 MFRLQFRLFP PLRTAM HILL TALLKCLSLL SLSC LHTLGN RLGHLAFYLL 

51 KEDRARIVAN MRQAGLNPDT QTVKAVFAET AKCGLELAPA FFKKPEDIET 

10 101 MFKAVHGWEH VQQALDKGEG LLFITPHIGS YDLGGRYISQ QLPFHLTAMY 

151 KPPKIKAIDK IMQAGRVRGK GKTAPTGIQG VKQIIKALRA GEATIILPDH 

201 VPSPQEGGGV WADFFGKPAY TMTLAAKLAH VKGVKTLFFC CERLPDGQGF 

251 VLHIRPVQGE LNGNKAHDAA VFNRNTEYWI RRFPTQYLFW YNRYKTP* 

ORF138ng and ORF138-1 show 94.3% identity over 299aa overlap: 

15 orf 138-1 .pep MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAN 

I I I I I i 1 I I I I I I M I I II I I I I I I I I I I I I I I I I I I I I | | | | | | || | | | | | | | | | | | | 
orfl38ng MFRLQFRLFPPLRTAMHILLTALLKCLSLLSLSCLHTLGNRLGHLAFYLLKEDRARIVAN 

orf 138- 1 . pep MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 
20 I I I II II I I : I I I I I I I I I I I I I I I I I I I I : I I I I I I I I | M | | | | | | | | | | | || | | 

orfl38ng MRQAGLNPDTQTVKAVFAETAKCGLELAPAFFKKPEDIETMFKAVHGWEHVQQALDKGEG 

I 

orf 138-1 . pep LLFITPHIGSYDLGGRYISQQLPFPLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTSIQG 
I ! I I I I I I I II I I I I I I I I i I I I I I I I I T I M I I I I I II I I I I I I I I I I ! I I I I I : I 1. 1 
25 orfl38ng LLFITPHIGSYDLGGRYISQQLPFHLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTGIQG 

orf 138-1 . pep VKQI IKALRSGEAT IVLPDHVPSPQEGGEGVWVDFFGKPAYTMTLAAKLAHVKGVKTLFF j 
Ml Ml 111:11 111:111 I Ml II II I 111:11 M.I I I Ml III I II ill I I II Mil | 
orfl38ng VKQI IKALRAGEAT I ILPDHVPSPQEGG-GVWADFFGKPAYTMTLAAKLAHVKGVKTLFF 

30 

orf 138-1 . pep CCERLPGGQGFDLHIRPVQGELNGDKAHDAAVFNRNAEYWIRRFPTQYLFMYNRYKMP 
I I II I I I I II II II II I II I I I : II II II I II I I : M II I II I II I I I II II I I I 
orfl38ng CCERL PDGQG FVLH I RPVQGELNGNKAH DAAV FNRNTE Y W I RRFPTQYL FMYNRYKT P 

In addition, ORF1 38ng is homologous to htrB protein from Pseudomonas fluorescens : 

35 gnl|PID|e334283 (Yl 4 568) htrB [Pseudomonas fluorescens] Length = 253 

Score = 80.8 bits (196) , Expect = 9e-15 

Identities = 49/151 (32%), Positives « 79/151 (51%), Gaps - 6/151 (3%) 

Query: 101 MFKAVHGWEHVQQALDKGEGLLFITPHIGSYD-LGGRYISQQLPFHLTAMYKPPKIKAID 159 
40 + + V G E +++AL G+G++ IT H+G+++ L Y SQ P Y+PPK+KA+D 

Sbjct: 94 LVREVEGLEVLKEALASGKGWGITSHLGNWEVLNHFYCSQCKPI IFYRPPKLKAVD 150 

Query: 160 KIMQAGRVRGKGKTAPTGIQGVKQIIKALRAGEATIILPDHVPSPQEGGGVWADFFGKPA 219 
++++ RV+ K A + +G+ +IK +R G I D P P E G++ FF A 
45 Sbjct: 151 ELLRKQRVQLGNKVAASTKEG I LS VI KE VRKGGQVG I PAD — PEPAESAGIFVPFFATQA 208 

Qu.?ry: 220 YTMTLAAKLAHVKGVKTLFFCCERLPDGQGF 250 

T + +F RLPDG G+ 

Sbjct: 209 LTSKFVPNMLAGGKAVG VFLHALRLPDG SGY 239 

50 Based on this analysis, including the presence of a putative transmembrane domain in the 
gonococcal protein, it was predicted that the proteins from Kmeningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF138-1 (57kDa) was cloned in the pGex vectors and expressed in E.coli, as described above. 
The products of protein expression and purification were analyzed by SDS-PAGE. Figure 14A 
55 shows the results of affinity purification of the GST-fusion protein. Purified GST-fusion protein 
was used to immunise mice, whose sera were used for ELISA (positive result) and FACS analysis 
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(Figure 14B). These experiments confirm that ORF138-1 is a surface-exposed protein, and that it 
is a useful immunogen. 

Example 69 

The following partial DNA sequence was identified in N. meningitidis <SEQ ED 573>: 

1 . . GCGTGGTCGG CCGGCGAATC GTGGCGTGTG TTAATGGAAA GTGAAACGTG 
51 GCATGCGGTG TGGAATACTT TGCGCTTCTC GGCGGCGGCG GTGTATGCGG 
101 CAGCGGTTTT GGGTGTGGTG TATGCGGCGC CGGCGCGGCG GTCGGCGTGG 
151 ATGCGCGGGC TGATGTTTTA GCCGTTTATG GTGTCGCCGG TTTGTGTTTC 
201 GGCGGGCGTG CTGCTGCTTT ATCCGCAGTG GACGGCTTCG TTGCCGTTGC 
251 TGCTGGCGAT GTATGCGCTG CTGGCGTATC CGTTTGTGGC AAAAGATGTT 
301 TTATCAGCCT GGGATGCACT GCCGCCGGAT TACGGCAGGG CGGCGGCGGG 
351 TTTGGGTGCA AACGGCTTTC AGACGGCATG CCGCATCACG TTCCCCCTCT 
401 TGAAACCGGC GTTGCGGCGC GGTCTGACTT TGGCGGCGGC AACCTGCGTG 
451 GGCGAATTTG CGGCGACATT GTTTCTGTCG CGTCCGGAAT GGCAGACGCT 
501 GACGACTTTG ATTTATGCCT ATTTGGGACG CGCGGGTGAG GATAATTACG 
551 CGCGGGCGAT GGTGCTG . . 

This corresponds to the amino acid sequence <SEQ ID 574; ORF139>: 

1 . .AWSAGESWRV LMESETWHAV WNTLRFSAAA VYAAAVLGW YAAPARRSAW 
51 MRGLMFXPFM VSPVCVSAGV LLLYPQWTAS LPLLLAMYAL LAYPFVAKDV 
101 LSAWDALPPD YGRAAAGLGA NGFQTACRIT FPLLKPALRR GLTLAAATCV 
151 GEFAATLFLS RPEWQTLTTL IYAYLGRAGE DNYARAMVL. . 

Further work revealed the complete nucleotide sequence <SEQ ID 575>: 

1 ATGGATGGAC GGCGTTGGGT GGTATGGGGT GCTTTTGCCC TGCTGCCTTC 

51 GGCTTTTTTG GCGGTAATGG TCGTTGCGCC TTTGTGGGCG GTGGCGGCGT 

101 ATGACGGTTT GGCGTGGCGC GCGGTGCTGT CGGATGCCTA TATGCTCAAA 

151 CGTTTGGCGT GGACGGTATT TCAGGCAGCG GCAACCTGTG TGCTGGTGCT 

201 GCCTTTGGGC GTGCCTGTCG CGTGGGTGCT GGCGCGGCTG GCGTTTCCGG 

251 GGCGGGCTTT GGTGCTGCGC CTGCTGATGC TGCCTTTTGT GATGCCCACG 

301 TTGGTGGCGG GCGTGGGCGT GCTGGCCCTG TTCGGGGCGG ACGGGCTGTT 

351 GTGGCGCGGC AGGCAGGATA CGCCGTATCT GTTGTTGTAC GGCAATGTGT 

401 TTTTCAACCT TCCTGTGTTG GTCAGGGCGG CGTATCAGGG GTTTGTGCAA 

451 GTGCCTGCGG CACGGCTTCA GACGGCACGG ACGTTGGGCG CGGGGGCGTG 

501 GCGGCGGTTT TGGGACATTG AAATGCCCGT TTTGCGCCCG TGGCTTGCCG 

551 GCGGCGTGTG CCTTGTCTTT CTGTATTGTT TTTCCGGGTT CGGGCTGGCG 

601 CTGCTGCTGG GCGGCAGCCG TTATGCCACG GTCGAAGTGG AAATTTACCA 

651 GTTGGTCATG TTCGAACTCG ATATGGCGGT TGCTTCGGTG CTGGTGTGGC 

701 TGGTGTTGGG GGTAACGGCG GCGGCAGGGT TGCTGTATGC GTGGTTCGGC 

751 AGGCGCGCGG TTTCGGATAA GGCGGTTTCC CCTGTGATGC CGTCGCCGCC 

801 GCAGTCGGTC GGGGAATATG TGCTGCTGGC GTTTGCGGCG GCGGTGTTGT 

851 CTGTGTGCTG CCTGTTTCCT TTGTTGGCAA TTGTTGTGAA AGCGTGGTCG 

901 GCCGGCGAAT CGTGGCGTGT GTTAATGGAA AGTGAAACGT GGCAGGCGGT 

951 GTGGAATACT TTGCGCTTCT CGGCGGCGGC GGTGTATGCG GCGGCGGTTT 

1001 TGGGTGTGGT GTATGCGGCG GCGGCGCGGC GGTCGGCGTG GATGCGCGGG 

1051 CTGATGTTTT TGCCGTTTAT GGTGTCGCCG GTTTGTGTTT CGGCGGGCGT 

1101 GCTGCTGCTT TATCCGCAGT GGACGGCTTC GTTGCCGTTG CTGCTGGCGA 

1151 TGTATGCGCT GCTGGCGTAT CCGTTTGTGG CAAAAGATGT TTTATCAGCC 

1201 TGGGATGCAC TGCCGCCGGA TTACGGCAGG GCGGCGGCGG GTTTGGGTGC 

1251 AAACGGCTTT CAGACGGCAT GCCGCATCAC GTTCCCCCTC TTGAAACCGG 

1301 CGTTGCGGCG CGGTCTGACT TTGGCGGCGG CAACCTGCGT GGGCGAATTT 

1351 GCGGCGACAT TGTTTCTGTC GCGTCCGGAA TGGCAGACGC TGACGACTTT 

1401 GATTTATGCC TATTTGGGAC GCGCGGGTGA GGATAATTAC GCGCGGGCGA 

1451 TGGTGCTGAC ATTGCTGTTG GCGGCGTTCG CGCTGGGTAT TTTCCTGCTG 

1501 TTGGACGGCG GCGAAGGCGG AAAACAGACG GAAACGTTAT AA 

This corresponds to the amino acid sequence <SEQ ID 576; ORF139-l>: 

1 MDGRRWWWG AFALLPSAFL AVMWAPLWA VAAYDGLAWR AVLSDAYMLK 

51 RLAWTVFQAA ATCVLVLPLG VPVAWV LARL AFPGRALVLR LLMLPFVMPT 

101 LVAGVGVLAL FG ADGLLWRG RQDTPYLLLY GNVFFNLPVL VRAAYQGFVQ 

151 VPAARLQTAR TLGAGAWRRF WDIEMPVLRP WLAGG VCLVF LYCFSGFGLA 
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201 LLLGGSRYAT VEVEIYQLVM FELDMAV ASV LVWLVLGVTA AAGLL YAWFG 

251 RRAVSDKAVS PVMPSPPQSV GEYVLLAFA A AVLSVCCLFP LLAIWKAWS 

301 AGESWRVLME SETWQAVWNT LRFS AAAVYA AAVLGWYAA AA RRSAWMRG 

351 LM FLPFMVSP VCVSAGVLLL YPQWTAS LPL LLAMY ALLAY PFVAK DVLSA 

401 WDALPPDYGR AAAGLGANGF QTACRITFPL LKPALRRGLT LAAATCVGEF 

451 AATLFLSRPE WQTLTTLIYA YLGRAGEDNY ARA MVLTLLL AAFALGIFLL 

501 LDGGEGGKQT ETL* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from K meningitidis (strain A^> 

ORF139 shows 94.7% identity over a 189aa overlap with an ORF (ORF139a) from strain A of N. 
meningitidis: 

10 20 30 

orf 139 . pep AWSAGESWRVLMESETWHAVWNTLRFSAAA 

I i I t I i I I I I I I 1 I I I | : | | | | | | || | | | 
orf 13 9a QSVGEYVLIAF AAAVXSVCCLFXLIAIW KAWSAGESWRVI^SETWnAVWMTYPP.gaaa 
270 280 290 300 310 320 

40 50 60 70 80 90 

or f 13 9 . pep VYAAAVLGVVYAAPA RRSAWMRGLM FXPFbWSPVCVSAGVLLL YPQWTAS LPLLIiAMYAL 

M I III Ml I III Mil MMII II I I M I I I I II I I M I I I I M I I I I I I I M I! I 
orf 13 9a VYAAAVLG WYAAAA RRS AWMRGLM FLPFMVS PVCV SAG VLLL X PQWTAS L PLLLAMYA L 

330 340 350 360 370 380 ; 

100 110 120 130 140 150 

orf 139 .pep LAYPBV^DVLSAWDALPPDYGRAAAGLGANGFQTACRITFPLLKPAIJUlGLTIjAA^ j 

I i mill f 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 f 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 . 

or f 1 3 9a LAYPFVAK DVLSAXDALPPDYGRAAAGLGANGFOTACRTTFPT.T.K'PaT.RP^T.TT r flftfl T rv | 

390 400 410 420 430 440 

160 170 180 189 

or f 139 . pep GEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNYARAMVL 

IIMIIM M llllllllllll MM I I I I I I iff 
orf 139a GEFAATLFXSRXEWQTLTTLIYAYXGRAGXDNYARA MVLTLLLAAFALGXFLLL DGGEGG 

450 460 470 480 490 500" 

The complete length ORF139a nucleotide sequence <SEQ ID 577> is: 

1 ATGGATGGAC GGCGTTGGGC GGTATGGGGT GCTTTTGCCC TGCTGCCTTC 

51 GGCTTTTTTG GCGGCAATGG TCGTTGCGCC TTTGTGGGCG" GTGGCGGCGT 

101 ATGACGGTTT GGCGTGGCGC GCGGTGCTGT CGGATGCCTA TATGCTCAAA 

151 CGTTTGGCGT GGACGGTATT TCAGGCAGCG GCAACCTGTG TGCTGGTGCT 

201 GCCTTTGGGC GTGCCTGTCG CGTGGGTGCT GGCGCGGCTG GCGTTTCCGG 

251 GGCGGGCTTT GGTGCTGCGC CTGCTGATGC TGCCTTTTGT GATGCCCACG 

301 TTGGTGGCGG GCGTGGGCGT GCTGGCTCTG TTCGGGGCGG ACGGCCTGTN 

351 GTGGCGCGGC TGGCAGGATA CGCCGTATCT GTTGTTGTAC GGCAATGTGT 

401 TTTTTNACCT TCCTGTGTTG GTCAGGGCGG CATATCAGGG GTTTGTGCAA 

451 GTGCCTGCGG CACGGCTTCA GACGGCACNG ACATTGGGCG CGGGGGCGTG 

501 GCGGCGGTTT TGGGACATTG AAATGCCCGT TTTGCGCCCG TGGCTTGCCG 

551 GCGGCGTGTG CCTTGTCTTC CTGTATTGTT TTTCGGGGTT CGGGCTGGCA 

601 TTGCTGCTGG GCGGCAGCCG TTATGCCACG GTCGAAGTGG AAATTTACCA 

651 GTTGGTCATG TTCGAACTCG ATATGGCGGT TGCTTOGGTG CTNGTGTGGC 

701 TGGTGTNGGG GGTAACNGCG GCGGCAGGGT TGCTGTATGC GTGGTTCGGC 

751 AGGCGCGCGG TTTCGGATAA GGCNGTTTCC CCTGTGATGC CGTCGCCGCC 

801 GCAGTCGGTC GGGGAATATG TGCTNCTGGC GTTTGCGGCG GCGGTGTNGT 

851 CTGTGTGCTG CCTGTTTCNT TTGTTGGCAA TTGTTGTGAA AGCGTGGTCG 

901 GCCGGCGAAT CGTGGCGTGT GTTAATGGAA AGTGAAACGT GGCAGGCGGT 

951 GTGGAATACT NTGCGCTTCT CGGCGGCGGC GGTGTATGCG GCGGCGGTTT 

1001 TGGGTGTGGT GTATGCGGCG GCGGCGCGGC GGTCGGCGTG GATGCGCGGG 

1051 CTGATGTTTT TGCCGTTTAT GGTGTCGCCG GTTTGTGTTT CGGCGGGCGT 

1101 GCTGCTGCTT NATCCGCAGT GGACGGCTTC GTTGCCGCTG CTGCTGGCGA 

1151 TGTATGCGCT GCTGGCGTAT CCGTTTGTGG CAAAAGATGT TTTATCAGCC 

1201 TGNGATGCAC TGCCGCCGGA TTACGGCAGG GCGGCGGCGG GTTTGGGTGC 

1251 AAACGGCTTT CAGACGGCAT GCCGCATCAC GTTCCCCCTC TTGAAACCGG 

1301 CGTTGCGGCG CGGTCTGACT TTGGCGGCGG CAACCTGCGT GGGCGAATTT 

1351 GCGGCAACCT TGTTCNTGTC GCGTCNCGAG TGGCAGACGC TGACGACTTT 
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1401 GATTTATGCC TATNTGGGAC GCGCGGGTGA NGATAATTAC GCGCGGGCGA 
1451 TGGTGCTGAC ATTGCTGTTG GCGGCGTTCG CGCTGGGTAT NTTCCTGCTG 
1501 TTGGACGGCG GCGAAGGCGG AAAACGGACG GAAACGTTAT AA 

This encodes a protein having amino acid sequence <SEQ ID 578>: 



1 MDGRRWAVWG AFALLPSAFL AAMWAPLWA VAAYDGLAWR AVLSDAYMLK 

51 RLAWTVFQAA ATCVLVLPLG VPVAWV LARL AFPGRALVLR LLMLPFVMPT 

101 LVAGVGVLAL FGA DGLXWRG WQDTPYLLLY GNVFFXLPVL VRAAYQGFVQ 

151 VPAARLQTAX TLGAGAWRRF WDIEMPVLRP WLAGG VCLVF LYCFSGFGLA 

201 LLLGGSRYAT VEVEIYQLVM FELDMAV ASV LVWLVXGVTA AAGLLY AWFG 

251 RRAVSDKAVS PVMPSPPQSV GEYVLLAF AA AVXSVCCLFX LLAIW KAWS 

301 AGESWRVLME SETWQAVWNT XRFS AAAVYA AAVLGWYAA AA RRSAWMRG 

351 LM FLPFMVSP VCVSAGVLLL XPQWTAS LPL LLAMYALLAY PFVAK DVLSA 

401 XDALPPDYGR AAAGLGANGF QTACRITFPL LKPALRRGLT LAAATCVGEF 

451 AATLFXSRXE WQTLTTLIYA YXGRAGXDNY ARA MVLTLLL AAFALGXFLL 

501 LDGGEGGKRT ETL* 

ORF139a and ORF139-1 show 96.5% homology over a 514aa overlap: 



orf 139a . pep MDGRRWAWGAFALL PS AFLAAMVVAPLWAVAAYDGLAWRAVLS DAYMLKRLAWTVFQAA 
I I I I I I : i I I I I II I I I I t I I : I i I I I I I I I i I I I I I I I I I I I I I I I I I t I I I I I I I I I I 
or f 1 3 9- 1 MDGRRWVVWGAFALLPSAElJ^VMWAPLWAVAAYDGIiAWRAVLSDAYMLKRLAWTVFQAA 



or f 139a - pep ATC^VLPLGVPVAWVLARLAFPGRALVLRLLMLPFVMPTLVAGVGVLALFGADGLXWRG 
I I I I I I I I I I I I I I I I I I II I I I I I I I I I I ! I I I I I I I I I I II II I I I I I I I I I I I III 
orf 139-1 ATCVLVLPLGVPVAWVLARLAFPGRALVLRLLMLPFVMPTLVAGVGVLALFGADGLLWRG 

or f 139a . pep WQDTPYLLLYGNVFFXLPVLVRAAYQGFVQVPAARLQTAXTLGAGAWRRFWDIEMPVLRP 

I ! I 1 1 1 I I i ! I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 1 1 I I I 1 I I 1 Ill 

orf 139-1 RQDTPYLLLYGNVFFNLPVLVRAAYQGFVQVPAARLQTARTLGAGAWRRFWDIEMPVLRP 



orf 139a. pep WLAGGVCLVFLYCFSGFGLALLLGGSRYATVEVEIYQLVMFELDMAVASVLVWLVXGVTA 
I I t I I I I ! I 1 I 1 I I I i I I E I I 1 I t I I I I I I I I I 1 ! I I I f I I t I I 1 I I I I I I I I 1 I I II I 
orf 139-1 WLAGGVCLVFLYCFSGFGLALLLGGSRYATVEVEIYQLVMFELDMAVASVLVWLVLGVTA 



orf 139a - pep AAGLLYAWFGRRAVSDKAVSPVMPSPPQSVGEYVLLAFAAAVXSVCCLFXLLAIWKAWS 
I I I I I I I I I I I II I I II I II I I I II I I I II I I II I I I II I I I I I I I I I IIMIUMI 
orf 139-1 AAGLL YAW FGRRAVS DKAVS PVMPS PPQSVGE YVLLAFAAAVLS VCCL FPLLAI WKAW S 

or f 139a . pep AGESWRVLMESETWQAVWNTXRFS AAAVYAAAVLGWYAAAARRS AWMRGLMFL PfWVS P 
I II I I I I I I I I II I I I I I II I I I I I I I I I I I I I I I II II I I I I II I I I I I I I I I I I I II 
or f 1 3 9- 1 AGESWRVLMESETWQAVWNTLRFSAAAVYAAAVLGWYAAAARRS AWMRGLMFLPFMVS P 



orf 139a . pep VCVS AGVLLLXPQWT AS LPLLLAMY ALLAY PFVAKDVLSAXDALPPDYGRAAAGLGANGF 
I I I I I I I I I I I I I II I I I I II I I I I I I I I I I I I I II II I I I I I I I I I I I II I I I I II I 
orf 139-1 VCV SAGVLLLY PQWTAS LPLLLAMYALLAYP FVAKDVLS AWDALPPD YGRAAAGLGANGF 



or f 13 9a . pep QTACRITFPLLKPALRRGLTLAAATCVGEFAATLFXSRXEWQTLTTLI YAYXGRAGXDNY 
I I I I I I I I I I I I I I I II I I I I I II I I I I I I I I I I I II I I I I II I I II I I I I II 111 
orf 139-1 QTACRITFPLLKPALRRGLTLAAATCVGEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNY 



or f 1 39a . pep ARAMVLTLLLAAFALGXFLLLDGGEGGKRTETLX 
I I II I I I I I I I I I I I I 11111111111:11111 
or f 1 3 9- 1 ARAMVLTLLLAAFALGI FLLLDGGEGGKQTETLX 



Homology with a predicted ORF from ^gonorrhoeae 

ORF139 shows 95.2% identity over a 189aa overlap with a predicted ORF (ORF139ng) from 
N.gonorrhoeae: 

orf 139. pep AWSAGESWRVLMESETWHAVWNTLRFSAAA 30 

I 1 I I I 1 I I I I I I I I II : I I I I I I II I I I I 
or f 1 3 9ng QSVGEYVLLAFSVAVLSVCCLFPLSAIVVKAWSAGESRRVLMESETWQAVWNTLRFSAAA 327 



or f 1 3 9 . pep VYAAAVLG WYAAPARRSAWMRGLMFXPFMVS PVCVS AGVLLLYPQWTASL PLLLAMYAL 90 

1:11111111111 111 : 1 1 I I I £ I I I I 1 I I 1 I I I I I I I I 1 I I I I I I I I M I 1 I I I I 
orfl39ng VFAAAVLGWYAAAARRLVWMRGLVFLPFMVS PVCVS AGVLLLYPGWTAS L PLLLAMYAL 387 
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orf 139 . pep LAYPFVAKDVLSAWDALPPDYGRAAAGLGANGFQTACRITFPLLKPALRRGLTLAAATCV 150 

M I I I I I I I I I I I ! I I ( I I I I { I M | | i ] | M I I I I I I I I I I I I I I I I I I I I I M I I I I I 
orf 139ng LAYPFVAKDVLSAWDALPPDYGRAAAGI£ANGFQTACRITFPLLKPALRRGLTLAAATCV 447 

orf 139 . pep GEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNYARAMVL 189 

I II I I I I I I I I I I I I I I I I I I I I II I I II I I I I I I I I I I 
orfl39ng GE FAATLFLSRPEWQTLTTL I YAYLGRAGE DNYARAMVLTLLLSAFAVC I FLLLDNGEGG 507 

The complete length ORF139ng nucleotide sequence <SEQ ID 579> is predicted to encode a 
protein having amino acid sequence <SEQ ID 580>: 

1 MDGRCWAVRG AFSLLPSAFL AVMWAPLWA VAAYDGLAWR AVLSDAYMLK 

51 RLAWTVFQAA ATCVLVLPLG VPVAWVLARL AFPGRALVLR LLMLPFVMPT 

101 LVAGVGVLAL FGADGLLWRG RQDTPYLLLY GNVFFNLPVL VRAAYQGFAQ 

151 VPAARLQTAR TLGAGAWRPF WDIEMPV LRP WLAGGVCLVF LYCFSGFGLA 

201 LLLGGSRYAT VEVEIYQLVM FELDMAGASA LVWLVLGVTA AAGLLYAWFG 

251 RRAVSDKAVS PVMPSPPQSV GEYVLLAFSV AVLSVCCLFP LSAIWKAWS 

301 AGESRRVLME SETWQAVWNT LRFSAAAVFA AAVLGWYAA AARRLVWMRG 

351 LVFLPFMVSP VCVSAGVLLL YPGWTASLPL LLAMYALLAY PFVAKDVLSA 

401 WDALPPDYGR AAAGLGANGF QTACRITFPL LKPALRRGLT LAAATCVGEF 

451 AATLFLSRPE WQTLTTLIYA YLGRAGEDNY ARAMVLTLLL SAFAVCIFLL 

501 LDNGEGGKRT ETL* 

Further work revealed a variant gonococcal DNA sequence <SEQ ID 58 1>: 

1 ATGGATGGAC GGTGTTGGGC GGTACGGGGT GCTTTTTCCC TGCTGCCTTC 

51 GGCTTTTTTG GCGGTAATGG TCGTTGCGCC TTTGTGGGCG GTGGCGGCGT 

101 ATGACGGTTT GGCGTGGCGC GCGGTGCTGT CGGATGCCTA TATGCTCAAA 

151 CGTTTGGCGT GGACGGTGTT TCAGGCGGCG GCAACCTGTG TGCTGGTGCT 1 

201 GCCTTTGGGC GTGCCTGTCG CGTGGGTGCT GGCGCGGCTG GCGTTCCCGG ' 

251 GGCGGGCTTT GGTGCTGCGC CTGCTGATGC TGCCGTTTGT GATGCCCACG | 

301 CTGGTGGCGG GCGTGGGCGT GCTGGCTCTG TTCGGGGCGG ACGGGCTGTT ' 

351 GTGGCGCGGC CGGCAGGATA CGCCGTATCT GTTGTTGTAC GGCAATGTGT 

401 TTTTCAACCT GCCCGTGTTG GTCAGGGCGG CGTATCAGGG GTTTGCTCAA 

451 GTGCCTGCGG CACGGCTTCA GACGGCACGG ACGTTGGGCG CGGGGGCGTG 

501 GCGGCGGTTT TGGGACATTG AAATGCCCGT TTTGCGCCCG TGGCTTGCCG 

551 GCGGCGTGTG CCTTGTCTTC CTGTATTGTT TTTCGGGGTT CGGGCTGGCA 

601 TTGCTGTTGG GCGGCAGCCG TTATGCCACG GTCGAAGTGG AAATTTACCA 

651 GTTGGTTATG TTCGAACTCG ATATGGCGGG GGCTTCGGCG CTGGTGTGGC 

701 TGGTGTTGGG GGTAACGGCG GCGGCAGGGT TGCTGTATGC GTGGTTCGGC 

751 AGGCGCGCGG TTTCGGATAA GGCGGTTTCC CCCGTGATGC CGTCGCCGCC 

801 GCAATCGGTG GGGGAATATG TATTGCTGGC ATTTTCGGTG GCGGTGTTGT 

851 CCGTGTGCTG CCTGTTTCCT TTGTCGGCAA TTGTTGTGAA AGCGTGGTCG 

901 GCCGGCGAAT CGCGGCGTGT GTTAATGGAA AGTGAAACGT GGCAGGCAGT 

951 GTGGAATACt ttGCGCTTTT CGGCGGCGGC GGTGTTTGCG GCGGCGGTTT 

1001 TGGGTGTGGT GTATGCGGCG GCGGCGCGGC GGCTGGTGTG GATGCGCGGA 

1051 CTGGTGTTTT TACCGTTTAT GGTGTCGCCG GTTTGTGTTT CGGCGGGCGT 

1101 GCTGCTGCTT TATCCGGGGT GGACGGCTTC GTTACCGCTG CTGCTGGCGA 

1151 TGTATGCGCT GCTGGCGTAT CCGTTTGTGG CAAAAGATGT TTTATCGGCC 

1201 TGGGATGCAC TGCCGCCGGA TTACGGCAGG GCGGCGGCAG GTTTGGGCGC 

1251 AAACGGCTTT CAGACGGCAT GCCGTATCAC GTTCCCCCTC TTGAAACCGG 

1301 CGTTGCGGCG CGGTCTGACT TTGGCGGCGG CGACGTGTGT GGGCGAATTT 

1351 GCGGCAACCT TGTTCCTGTC GCGTCCGGAA TGGCAGACGT TGACGACTTT 

1401 GATTTATGCC TATTTGGGGC GTGCGGGTGA GGACAATTAT GCGCGGGCAA 

1451 TGGTGTTGAC ATTGCTGTTG TCGGCATTTG CGGTGTGCAT TTTCCTGCTG 

1501 TTGGACAACG GCGAAGGCGg aaaACGGACG GAAACGTTAT AA 

This corresponds to the amino acid sequence <SEQ ID 582; ORF139ng-l>: 

1 MDGRCWAVRG AFSLLPSAFL AVMWAPLWA VAAYDGLAWR AVLSDAYMLK 

51 RLAWTVFQAA ATCVLVLPLG VPVAWVL ARL AFPGRALVLR LLMLP FVMPT 

101 LVAGVGVLAL FGA DGLLWRG RQDTPYLLLY GNVFFNLPVL VRAAYQGFAQ 

151 VPAARLQTAR TLGAGAWRKF WDIEMPVLRP WLAGG VCLVF LYCFSGFGLA 

201 LLLGGSRYAT VEVEIYQLVM FELDMAG ASA LVWLVLGVTA AAGLL YAWFG 

251 RRAVSDKAVS PVMPSPPQSV GEYVLLAFS V AVLSVCCLFP LSAIW KAWS 

301 AGESRRVLME SETWQAVWNT LRFS AAAVFA AAVLGWYAA AA RRLVWMRG 

351 LV FLPFMVSP VCVSAGVLLL YPGWTAS LPL LLAMYALLAY PFVA KDVLSA 

401 WDALPPDYGR AAAGLGANGF QTACRITFPL LKPALRRGLT LAAATCVGEF 

451 AATLFLSRPE WQTLTTLIYA YLGRAGEDNY ARA MVLTLLL SAFAVCIFLL 

501 LDNGEGGKRT ETL* : 
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ORF139ng-l and ORF139-1 show 95.9% identity over 513aa overlap: 

orfl39ng MDGRCWAVRGAFSLLPSAFIAVMVVAPLWAVAAYDGLAWRAVLSDAYMLKRLAWTVFQAA 

I I I I 1:1 I I I : I I II II I I I I I i I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 
orf 139-1 MDGRRWVWGAFALLPSAFIAVMVVAPLWAVAAYDGLAWRAVLSDAYMLKRLAWTVFQAA 

or f 1 3 9ng ATCVLVLPLGVPVAOTLARLAFPGRALVLRLLMLPFVMPTLVAGVGVLALFGADGLLWRG 

II I I II 111 IN II MINI Ml Ml! I I I il Ml MM II Ml I M M MM! I f II I I 
or f 1 3 9 - 1 ATCTLVLPI^VPVAWVI^IAFPGRALVLRLMLPFVMPTLVAGVGVIJU.FGADGLLWRG 

or f 1 3 9ng RQDTPYLLLYGNVFFNLPVLVRAAYQGFAQVPAARLQTARTLGAGAWRRFWDIEMPVLRP 
MM! MINI MM III Ml MM 111:111 111 111 I II I Ml I III lltlltl MM 
orfl39-l RQDTPYLLLYGNVFFNLPVLVRAAYQGFVQVPAARLQTARTLGAGAWRRFWDIEMPVLRP 

orfl39ng WLAGGVCLVFLYCFSGFGLALLLGGSRYATVEVEIYQLVMFELDMAGASALVWLVLGVTA 
M I II II I I II II I M II II M II II II I II M II II II M II II I II : I II II II II I 
orf 139-1 WLAGGVCLVFLYC FSGFGLALLLGG SRYATVEVE I YQLVMFE LDMAVAS VLVWLVLGVT A 

orfl39ng AAGLLYAWFGRRAVSDKAVSPVMPSPPQSVGEYVLLAFSVAVLSVCCLFPLSAIWKAWS 
I M II M II I M I I II II II II II I II II II 1 I M II I :: I M II I II II I I II II It I 
or f 1 3 9 - 1 AAGLLYAWFGRRAVSDKAVSPVMPS PPQSVGEYVLLAFAAAVLSVCCLFPLLAI WKAWS 

orfl39ng AGESRRVLMESETWQAVWNTIJIFSAAAVFAAAVLGVVYAAAARRLWMRGLVFLPFMVSP 
MM II 1 1 I I I II I I I II M I II II I I : M II M I I I M I II I : I II M : II II II II 
or f 1 3 9 AGESWRVLMESETWQAVWNTLRFSAAAVYAAAVLGWYAAAARRSAWMRGLMFLPFMVS P 

orf!39ng VCVSAGVLLLYPGWTASLPLLLAMYALLAYPFVAKDVLSAWDALPPDYGRAAAGLGANGF 

I J I I 1 1 I 1 1 1 1 1 II I M M M II 11 II II II II II I M II II II M II 1 II II II II II 
orf 139-1 VCVSAGVLLLYPQWTASLPLLLAMYALLAYPFVAKDVLSAWDALPPDYGRAAAGLGANGF 

orfl39ng QTACRITFPLLKPALRRGLTLAAATCVGEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNY 

II II II M II I II II I II II II I II M II 1 II II M II II I I I II II II I I II II M I II 
or f 1 3 9 - 1 QTACRITFPLLKPALRRGLTLAAATCVGEFAATLFLSRPEWQTLTTLI YAYLGRAGEDNY 

or f 13 9ng ARAMVLT LLLS AFAVC I FLLLDNGEGGKRTETL 

F i I I t I I I I I : II I : 1 1 1 I I ! : 1 1 I I 1 : I ! I I 
orf 1 3 9- 1 ARAMVLTLLLAAFALGIFLLLDGGEGGKQTETL 

Based on the presence of a predicted binding-protein-dependent transport systems inner membrane 
component signature (underlined) in the gonococcal protein, it is predicted that the proteins from 
N.meningitidis and N.gonorrhoeae, and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 



Example 70 



The following partial DNA sequence was identified in ^meningitidis <SEQ ID 583>: 

1 ATGGACGGCT GGACACAGAC GCTGTCCGCG CAAACCCTGT TGGGCATTTC 

51 GGCGGCGGCA ATCATCCTCA TTCTGATTTT AATCGTCAGA TTCCGCATCC 

101 ACGCGCTGCT GACACTGGTC ATCGTCAGCC TGCTGACGGC TTTGGCAACC 

151 GGTTTGCCCA CAGGCAGCAT TGTCAAAGAC ATACTGGTCA AAAACTTCGG 

201 CGGCACGCTC GGCGGCGTGG CGCTTCTGGT CGGCCTGGGC GCGATGCTCG 

251 AACGTTTGGT C. . . 

This corresponds to the amino acid sequence <SEQ ID 584; ORF140>: 

1 MDGWTQTLSA QTLLGISAAA IILILILIVR FRIHALLTLV IVSLLTALAT 
51 GLPTGSIVKD ILVKNFGGTL GGVALLVGLG AMLERLV. . 

Further work revealed the complete nucleotide sequence <SEQ ID 585>: 

1 ATGGACGGCT GGACACAGAC GCTGTCCGCG CAAACCCTGT TGGGCATTTC 

51 GGCGGCGGCA ATCATCCTCA TTCTGATTTT AATCGTCAAA TTCCGCATCC 

101 ACGCGCTGCT GACACTGGTC ATCGTCAGCC TGCTGACGGC TTTGGCAACC 

151 GGTTTGCCCA CAGGCAGCAT TGTCAACGAC ATACTGGTCA AAAACTTCGG 
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201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 



CGGCACGCTC 
GACGTTTGGT 
ATCCGGATGT 
GCTGATTTTC 
TGCCCATCGT 
TTCGCGCTTG 
GCCCCATCCG 
GCCAAGTTTT 
AGCGGCTATA 
TCCCGAACTG 
CTGCCAAAGC 
ATTTTCCTGA 
TGCGGACGAA 
TCGCCCTTCT 
CGCGGCGAAA 
CCCCGTCTGT 
GCGTTTTGCG 
GATTTGGGCA 
GCGTATCGCG 
TGATGGCTCC 
TGTATCGTAT 
CGACTCCGGC 
CCACGCTGAA 
TTTGCCTTGT 



GGCGGCGTGG 
CGAAACATCC 
TCGGCGAAAA 
GGCTTCCCGA 
GTTCGCCACC 
CCTCCATCGG 
GGCCCGATTG 
GATTTTGGGT 
TGCTCGGCAA 
CTCAGCGGCG 
AGGAACGGTC 
ATACCGGCGT 
ACCTGGGTTC 
GATTTCCGTA 
GCGGCAGCGC 
TCCGTGATTC 
CGCTTCCGGC 
TTCCCGTCCT 
CAAGGTTCGG 
TGCCGTTGCC 
TGGCAACGGC 
TTCTGGCTGG 
AACCTGGACG 
CCGCACTGCT 



CGCTTCTGGT 
GGCGGCGCAC 
ACGCGCACCG 
TTTTCTTCGA 
GCACGGCGCA 
CGCATTTTCC 
CCGCTTCCGA 
CTGCCGACCG 
AGTGTTGGGG 
GCACGCAAGA 
GTCGCCATCA 
ATCGGCCCTC 
AGACGGCAAA 
TTGGTCGCAC 
GTTGGAAAAA 
TGATTACCGG 
ATCGGCAAGG 
TTTGGGCTGT 
CAACCGTCGC 
GCCGCCGGCT 
GGCAGGTTCG 
TCGGCCGTCT 
GTCAACCAAA 
GTTCGCCATC 



CGGCCTGGGC 
AGTCGCTGGC 
TTCGCGCTGG 
TGCCGGACTA 
TGAAACAGGA 
GTCATGCACG 
ATTTTACGGC 
CCTTCATCAC 
CGCACCATCC 
CAACGACCTG 
TGCTGATTCC 
ATCAGCGAAA 
AATAATCGGT 
TGTTTGTCTT 
ACCGTGGACG 
CGCGGGCGGT 
CACTCGCCGA 
TTCCTTGTCG 
CCTGACCACC 
TTACCGACTG 
GTCGGTTGCA 
CTTGGACATG 
CCCTCATCGC 
GTCTGA 



GCGATGCTCG 
GGACGCGCTG 
GCGTTGCCTC 
ATCGTCATGC 
CGTACTGCCC 
TCTTCCTGCC 
GCGAACATCG 
ATGGTATTTC 
ATGTTCCCGT 
CCGAAAGAAC 
CATGCTGCTG 
AACTCGTAAG 
TCGACACCGA 
GGGACGCAAA 
GCGCACTCGC 
ATGTTCGGCG 
CAGCATGGCG 
CCTTGGCACT 
GCCGCCGCGC 
GCAGCTCGCC 
GCCACTTCAA 
GACGTACCGA 
ACTCATCGGC 



This corresponds to the amino acid sequence <SEQ ID 586; ORF140-1>: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 



MDGWTQTLSA QTLLGISAAA IILILILIVK FRIHALLTLV IVSLLTALAT 



GLPTGSIVND ILVKNFGGTL 
IRMFGEKRAP FALGVAS LIF 
FALAS IGAFS VMHV FLPPHP 
SGYMLGKVLG RTIHVPVPEL 
IFLNTGVSAL ISEKLVSADE 
RGESGSALEK TVDGALAPVC 
DLG IPVLLGC FLVALALRIA 
CIVLATAAGS VGCSHFNDSG 
. FALSALLFAIV* 



GGVALLVGLG AMLGRLVE TS GGAQSLADAL 
GFPIFFDAGL IVML PIVFAT ARRMKQD VLP 
GPIAASEFYG ANIGQVLILG LPTAFITWYF 
LSGGTQDNDL PKEPA KAGTV VAIMLIPMLL 
TWVQTAKIIG S TPIALLISV LVALFVLG RK 
SVILITGAGG MFGGVLR ASG IGKALADSMA 
QGSAT VALTT AAALMAPAVA AA GFTDWQLA 
FWLVGRLLDM DVPTTLKTWT VNQTLIALIG 



Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF140 shows 95.4% identity over a 87aa overlap with an ORF (ORFHOa) from strain A of K 
meningitidis: 

10 20 30 40 50 60 

or f 140 . pep MDGWTQTLSAQTLLGISAAAIILILILIVRFRIHALLTLVIVSLLTALATG LPTGSIVKD 
I I I I I ! M I I I I i I M I I I I I I 1. 1 1 1 I I I : I II I I I 1 1 1 II I I I I I 1 1 I I I I I I i I I I : I 
orfl40a MDGWTQTLSAQTLLGISAAAIILILILIVKFRIHALLTLVIVSLLTALATG LPTGSIVND 

10 20 30 40 50 60 

70 80 
orfl40.pep ILVKNFGGT LGGVALLVGLGAMLERLV 
: I I I M I I I I I I I II I I I I I I I I 111 
orfl40a VLVKNFGGT LGGVALLVGLGAMLGRLV ETSGGAQSLADALIRMFGEKRAP FALGVASLIF 

70 80 90 100 110 120 

The complete length ORF140a nucleotide sequence <SEQ ID 587> is: 



1 ATGGACGGCT 

51 GGCGGCGGCA 

101 ACGCGCTGCT 

151 GGTTTGCCCA 

201 CGGCACGCTC 

251 GACGTTTGGT 

301 ATCCGGATGT 

351 GCTGATTTTC 

401 TGCCCATCGT 

451 TTCGCGCTTG 



GGACACAGAC 
ATCATCCTCA 
GACACTGGTC 
CAGGCAGCAT 
GGCGGCGTGG 
CGAAACATCC 
TCGGCGAAAA 
GGCTTCCCGA 
GTTCGCCACC 
CCTCCATCGG 



GCTGTCCGCG 
TTCTGATTTT 
ATCGTCAGCC 
TGTCAACGAC 
CGCTTCTGGT 
GGCGGCGCAC 
ACGCGCACCG 
TTTTCTTCGA 
GCACGGCGCA 
CGCATTTTCC 



CAAACCCTGT 
AATCGTCAAA 
TGCTGACGGC 
GTACTGGTCA 
CGGCCTGGGC 
AGTCGCTGGC 
TTCGCGCTGG 
TGCCGGACTA 
TGAAACAGGA 
GTCATGCACG 



TGGGCATTTC 
TTCCGCATCC 
TTTGGCAACC 
AAAACTTCGG 
GCGATGCTCG 
GGACGCGCTG 
GCGTTGCCTC 
ATCGTCATGC 
CGTACTGCCC 
TCTTCCTGCC 
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501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 



GCCCCATCCG 
GCCAAGTTTT 
AGCGGCTATA 
TCCCGAACTG 
CTGCCAAAGC 
ATTTTCCTGA 
TGCGGACGAA 
TCGCCCTTCT 
CGCGGCGAAA 
CCCCGTCTGT 
GCGTTTTGCG 
GATTTGGGCA 
GCGTATCGCG 
TGATGGCTCC 
TGTATCGTAT 
CGACTCCGGC 
CCACGCTGAA 
TTTGCCTTGT 



GGCCCGATTG 
GATTTTGGGT 
TGCTCGGCAA 
CTCAGCGGCG 
AGGAACGGTC 
ATACCGGCGT 
ACCTGGGTTC 
GATTTCCGTA 
GCGGCAGCGC 
TCCGTGATTC 
CGCTTCCGGC 
TTCCCGTCCT 
CAAGGTTCGG 
TGCCGTTGCC 
TGGCAACGGC 
TTCTGGCTGG 
AACCTGGACG 
CCGCACTGCT 



CCGCTTCCGA 
CTGCCGACCG 
AGTGTTGGGG 
GCACGCAAGA 
GTCGCCATCA 
ATCGGCCCTC 
AGACGGCAAA 
TTGGTCGCAC 
GTTGGAAAAA 
TGATTACCGG 
ATCGGCAAGG 
TTTGGGCTGT 
CAACCGTCGC 
GCCGCCGGCT 
GGCAGGTTCG 
TCGGCCGCCT 
GTCAACCAAA 
GTTCGCCATC 



ATTTTACGGC 
CCTTCATCAC 
CGCACCATCC 
CAACGACCTG 
TGCTGATTCC 
ATCAGCGAAA 
AATAATCGGT 
TGTTTGTCTT 
ACCGTGGACG 
CGCGGGCGGT 
CACTCGCCGA 
TTCCTTGTCG 
CCTGACCACC 
TTACCGACTG 
GTCGGTTGCA 
CTTGGACATG 
CCCTCATCGC 
GTCTGA 



GCGAACATCG 
ATGGTATTTC 
ATGTTCCCGT 
CCGAAAGAAC 
CATGCTGCTG 
AACTCGTAAG 
TCGACACCGA 
GGGACGCAAA 
GCGCACTCGC 
ATGTTCGGCG 
CAGCATGGCG 
CCTTGGCACT 
GCCGCCGCGC 
GCAGCTCGCC 
GCCACTTCAA 
GACGTACCGA 
ACTCATCGGC 



This encodes a protein having amino acid sequence <SEQ ID 588>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 



MDGWTQTLSA QTLLGISAAA IILILILIVK FRIHALLTLV IVSLLTALAT 



GLPTGSIVND VLVKNFGGTL 
IRMFGEKRAP FALGVAS LIF 
FALASIGAFS VMHV FLPPHP 
SGYMLGKVLG RTIHVPVPEL 
IFLNTGVSAL I SEKLVS ADE 
RGESGSALEK TVDGALAPVC 
DLG IPVLLGC FLVALALRIA 
CIVLATAAGS VGCSHFNDSG 
FALSALLFAI V» 



GGVALLVGLG AMLGRLVE TS GGAQSLAOAL 
GFPIFFDAGL IVML PIVFAT ARRMKQDVLP 
GPIAASEFYG ANIGQVLILG LPTAFITWYF 
LSGGTQDNDL PKEPA KAGTV VAIMLIPMLL 
TWVQTAKIIG S TPIALLISV LVALFVLG RK 
SVILITGAGG MFGGVL RASG IGKALADSMA 
QGSAT VALTT AAALMAPAVA AA GFTDWQLA 
FWLVGRLLDM DVPTTLKTWT VNQTLIALIG 



ORF140a and ORF140-1 show 99.8% identity over a 461aa overlap: 



orf 140-1 .pep MDGWTQTLSAQTLLGISAAAIILILILIVKFRIHALLTLVIVSLLTALATGLPTGSIVND 60 

II I 1 1 1 II 1 1 1 1 1 1 1 I 1 1 1 1 I ! I 1 1 I! I I 1 1 II I 1 1 1 1 1 1 1 1 1 1 1 I I I I I! 1 1 1 1 1 1 I I I 
orf!40a MDGWTQTLS AQTLLG I SAAAI ILI LI LI VKFRIHALLTLVI VSLLTALATGLPTGS I VND 60 



orf 140-1. pep 

orfl40a 

orf 140-1. pep 



orf!40a 



orf 140-1. pep 



orfl40a 



orf 140-1. pep 

orf 140a 

orf 140-1. pep 

orfl40a 

orf 140-1. pep 



orfl40a 



orf 140-1 .pep 
orfl40a 



I LVKNFGGTLGGVALLVGLGAMLGRLVETSGGAQS LADALIRMFGEKRAPFALGVASLI F 120 
: II I I I t I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 
VLVKN FGGT LGGVALLVGLGAMLGRLVETSGGAQS LADALI RMFGEKRAPFALGVASLI F 120 

GFPIFFDAGLIVMLPIVFATARRMKQDVLPFALASIGAFSVMHVFLPPHPGPIAASEFYG 180 
I II I M I I I I I I I I I I I I! I I I II I I I I II 1 M I I I I I I I I I I I I I I I I I I I I M I I I I I 
GFPIFFDAGLIVMLPIVFATARRMKQDVLPFALASIGAFSVMHVFLPPHPGPIAASEFYG 810 

ANIGQVLILGLPTAFITWYFSGYMLGKVLGRTIHVPVPELLSGGTQDNDLPKEPAKAGTV 240 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 
ANIGQVLILGLPTAFITWYFSGYMLGKVLGRTIHVPVPELLSGGTQDNDLPKEPAKAGTV 240 

VAI MLI PMLL I FLNTGVSALI SEKLVS ADETWVQTAKI IGSTPI ALLI S VLVALFVLGRK 300 
M I I I I I I I I I I I I II I I I I I I I I I I I I II I II I I I I I I I I I I II I II I I I I I I I I I I I I 
VAIMLI PMLL I FLNTGVSALI SEKLVS ADETWVQTAKI IGST PI ALLI S VLVALFVLGRK 300 

RGESGSALEKTVDGALAPVCSVILITGAGGMFGGVLRASGIGKALADSMADLGIPVLLGC 360 
I! I II I I I I I I I I I I I I I I II I I II I I I 1 I I I I I I I I I II I I II I I I I I I I I I I I I I II I 
RGESGSALEKTVDGALAPVCSVILITGAGGMFGGVLRASGIGKALADSMADLGIPVLLGC 360 

FLVALALRI AQG S ATVALTTAAALMAPAVAAAGFT DWQLACI VLAT AAG SVGCSHFNDSG 420 

1 1 1 1 1 i 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 K i 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 j 1 1 1 1 1 1 

FLVALALRIAQGSATVALTTAAALMAPAVAAAGFTDWQLACIVLATAAGSVGCSHFNDSG 420 

FWLVGRLLDM DVPTTLKTWTVNQTLIAL I GFALSALLFAIV 461 
Mill I (III llllll I Mill I HIM HUM II lllll 
FWLVGRLLDMDVPTTLKTWTVNQTLIALIGFALSALLFAIV 4 61 



Homology with a predicted ORF from Kzonorrhoeae 

ORF140 shows 92% identity over a 87aa overlap with a predicted ORF (ORF140ng) from 



N. gonorrhoeae: 
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orf 140 . pep MDGWTQTLSAQTLLGISAAAIILILILIVRFRIHALLTLVIVSLLTALATGLPTGSIVKD 60 

Ml I I ! I I I I I I I I I I I I i I I I I I I I M : I I I : I I I I I I I : M I t I I M i | M I M I : I 
orfl40ng MDGRTQTLSAQTLLGISAAAIILILILIVKFRIRALLTLVIASLLTALATGLPTGSIVND 60 

orf 140 . pep I LVKN FGGTLGGVALLVGLGAMLERLV 87 

: I I I I I I I I I I I I I I I I I II I I I III 
or f 1 4 Ong VLVKNFGGTLGGVALLVGLGAMLGRLVETSGGAQSLADALIRMFGEKRAPFAPGVASLIF 120 

The complete length ORF140ng nucleotide sequence <SEQ ID 589> was predicted to encode a 
protein having amino acid sequence <SEQ ID 590>: 

1 MDGRTQTLSA OTLLGISAAA IILILILIVK FRIRALLTLV IASLLTALAT 

51 gLPTGSIVND VLVKNFGGTL GGVALLVGLG AMLGRLVE TS GGAQSLADAL 

101 IRMFGEKRAP FAPGVAS LIF GFPIFFDAGL IVML PIVFAT ARRMKQ DVLP 

151 FALASVGAFS VMHV FLPPHP GPIAASEFYG ANIGQVLILG LPTAFITWYF 

201 SGYMLGKVLG RAIHVPVPEL LSGGTQDSDP PKEPA KAGTV VAVMLIPMLL 

251 IFLNTGVSAL ISEKLVSADE TWVQTAKMIG S TPVALLISV LAALLVLGR K 

301 RGESGSTLEK TVDGALAPA C SVILITGAGG MFGGVL RASG IGKALADSMA 

351 DLG IPVLLGC FLVALALRIA QGSAT VALTT AAALMAPAVA AA GFTDWQLA 

401 CIVLATAAGS VGCSHFNDSG FWLVGRLSDM DVPTTLKTWT VNQTLIAFIG 

451 FALSALLFAI V * 

Further work revealed a variant gonococcal DNA sequence <SEQ ID 591>: 

1 ATGGACGGGC GGACACAGAC GCTGTCCGCG CAAACCTTGT TGGGCATTTC ! 

51 GGCGGCGGCA ATCATCCTCA TTCTGATTTT AATCGTCAAA TTCCGCATCC 

101 GCGCGCTGCT GACACTGGTC ATCGCCAGCC TGCTGACGGC TTTGGCAACC 

151 GGTTTGCCCA CAGGCAGCAT CGTCAACGAC GTACTGGTCA AAAACTTCGG 

201 CGGCACGCTC GGCGGCGTGG CGCTTCTGGT CGGTCTGGGC GCAATGCTCG 

251 GACGTTTGGT AGAAACATCC GGCGGCGCAC AGTCGCTGGC GGACGCGCTG I 

301 ATCCGGATGT TCGGCGAAAA ACGCGCACCG TTCGCTCGGG GCGTTGCCTC 

351 GCTGATTTTC GGCTTCCCGA TTTTCTTCGA TGCCGGACTA ATCGTCATGC 

401 TGCCCATCGT ATTCGCCACC GCACGGCGCA TGAAACAGGA CGTACTGCCC 

451 TTCGCGCTTG . CCTCCGTCGG CGCATTTTCC GTCATGCACG TCTTCCTGCC 

501 GCCCCATCCG GGCCCGATTG CCGCTTCCGA ATTTTACGGC GCGAACATCG 

551 GCCAGGTTTT GATTTTGGGT CTGCCGACCG CCTTCATCAC ATGGTATTTC 

601 AGCGGCTATA TGCTCGGCAA AGTGTTGGGG CGCGCCATCC ATGTTCCCGT 

651 TCCCGAACTG CTCAGCGGCG GCACGCAAGA CAGCGACCCG CCGAAAGAAC 

701 CTGCCAAAGC AGGAACGGTC GTCGCCGTCA TGCTGATTCC CATGCTGCTG 

751 ATTTTCCTGA ATACCGGCGT ATCAGCCCTC ATCAGCGAAA AACTCGTAAG 

801 TGCGGACGAA ACTTGGGTTC AGACGGCAAA AATGATCGGT TCGACACCTG 

851 TCGCCCTTCT GATTTCCGTA TTGGCCGCAC TGTTGGTCTT GGGACGCAAA 

901 CGCGGCGAAA GCGGCAGCAC GTTGGAAAAA ACCGTGGACG GCGCACTCGC 

951 CCCCGCCTGT TCCGTGATTC TGATTACCGG CGCGGGCGGT ATGTTCGGCG 

1001 GCGTTTTGCG CGCTTCCGGC ATCGGCAAGG CACTCGCCGA CAGCATGGCG 

1051 GATTTGGGCA TTCCCGTCCT TTTGGGCTGC TTCCTTGTCG CCTTGGCACT 

1101 GCGTATCGCG CAAGGTTCGG CAACCGTCGC CCTGACCACA GCCGCCGCGC 

1151 TGATGGCTCC TGCCGTTGCC GCCGCCGGCT TTACCGACTG GCAGCTCGCC 

1201 TGTATCGTAT TGGCAACGGC GGCAGGTTCG GTCGGTTGCA GCCACTTCAA 

1251 CGACTCCGGC TTCTGGCTGG TCGGCCGCCT CTTGGATATG GACGTACCGA 

1301 CCACGCTGAA AACCTGGACG GTCAACCAAA CCCTCATCGC ATTCATCGGC 

1351 TTTGCCTTGT CCGCACTGCT GTTTGCCATC GTCTGA 

This corresponds to the amino acid sequence <SEQ ID 592; ORF140ng-l>: 

1 MDGRTQTLSA QTLLGISAAA IILILILIVK FRIRALLTLV IASLLTALAT 

51 GLPTGSIVND VLVKNFGGTL GGVALLVGLG AMLGRLV ETS GGAQSLADAL 

101 IRMFGEKRAP FAPGVAS LIF GFPIFFDAGL IVML PIVFAT ARRMKQDVLP 

151 FALASVGAFS VMHV FLPPHP GPIAASEFYG ANIGQVLILG LPTAFITWYF 

201 SGYMLGKVLG RAIHVPVPEL LSGGTQDSDP PKEPA KAGTV VAVMLIPMLL 

251 IFLNTGVSAL ISEKLVSADE TWVQTAKMIG S TPVALLISV LAALLVLG RK 

301 RGESGSTLEK TVDGALAPAC SVILITGAGG MFGGVL RASG IGKALADSMA 

351 DLG IPVLLGC FLVALALRIA QGSAT VALTT AAALMAPAVA AA GFTDWQLA 

401 CIVLATAAGS VGCSHFNDSG FWLVGRLLDM DVPTTLKTWT VNQT LIAFIG 

451 FALSALLFAI V* 

ORF140ng-l and ORF140-1 show 96.3% identity over 461aa overlap: 

orf 140ng-l .pep MDGRTQTLSAQTLLGISAAAIILILILIVKFRIRALLTLVIASLLTALATGLPTGSIVND 
III M I 1 II I I I I II I I I 1 I | I | | | | I I I I I I : I M I I I I : I I I H M I I I I I I I I I I I 
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orf 140-1 MDGWTQTLSAQTLLGISAAAIILILILIVKFRIHALLTLVIVSLLTALATGLPTGSIVND 

orf 14 Ong-1 . pep VLVKNFGGTLGGVALLVGLGAMLGRLVET SGGAQS LADALIRMFGEKRAPFAPGVASL I F 
r I I 1 I 1 } I K I I ! I I I I I 1 I I I 1 I I i 1 I I I I I 1 1 1 I I I I I I ! I 1 t I I I I I 1 ! I I M I I I I 
or f 14 0- 1 ILVKNFGGTLGGVALLVGLGAMLGRLVETSGGAQSLADALIRMFGEKRAPFALGVASLIF 

orf 140ng-l .pep GFPIFFDAGLIVMLPIVFATARRMKQDVLPFALASVGAFSVMHVFLPPHPGPIAASEFYG 
1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 I I i 1 1 I 1 1 1 1 1 1 1 1 1 : ! 1 1 1 1 1 1 I I 1 1 1 1 1 1 1 1 1 1 M M 1 1 
orf 14 0- 1 GFPIFFDAGLIVMLPIVFATARRMKQDVLPFALASIGAFSVMHVFLPPHPGPIAASEFYG 

orfl40ng-l.pep ANIGQVLILGLPTAFITWYFSGYMLGKVLGRAIHVPVPELLSGGTQDSDPPKEPAKAGTV 
I I I i I ! I I I I M I I I 1 M I I I I I 1 i I I I t I I : i I I I I I I I I I M I I I : I IMIIIIIII 
orf 140-1 ANIGQVLILGLPTAFITWYFSGYMLGKVLGRTIHVPVPELLSGGTQDNDLPKEPAKAGTV 



15 orf 140ng-l .pep VAVML I PMLLI FLNTGVS ALI SEKLVSADETWVQTAKMI GSTPVALLI S VLAALLVLGRK 

I |: I I I I II I II I II I I I I I II I I I I I I I I I I I I I I I : I I I I I : I II I II I : I I : I I I I I 
or5140-l VAIMLIPMLLIFLNTGVSALISEKLVSADETWVQTAKIIGSTPIALLISVLVALFVLGRK 

or f 14 0ng-l . pep RGESGSTLEKTVDGALAPACSVILITGAGGMFGGVLRASGIGKALADSMADLGIPVLLGC 

20 I I I I I I : I I I I I I I I I I I : I I I M I I I I I I I I I M I M I I I I I II II II I I I I I I I I I I I 

orf 140-1 RGESGSALEKTVDGALAPVCSVILITGAGGMFGGVLRASGIGKALADS^4ADLGIPVLLGC 

orf 140ng-l .pep FLVALALRIAQGSATVALTTAAALMAPAVAAAGFTDWQLACIVLATAAGSVGCSHFNDSG 
1 1 1 1 1 1 1 I 1 1 1 1 1 I 1 1 1 1 I f I E 1 1 1 1 1 1 1 1 1 1 1 1 1 I i 1 1 I 1 1 1 1 1 1 1 1 1 f 1 1 K 1 1 1 1 1 1 1 
25 orf 14 0-1 FLVALALRI AQGS ATVALTTAAALMAPAVAAAG FT DWQLACI VLATAAGS VGC SHFNDSG 

orf 140ng-l .pep FWLVGRLLDMDVPTTLKTWTVNQTLIAFIGFALSALLFAIV 
I I I I I I I I II I I I II I II II I I I I I I I: I I I I I I I I I I I I I 
or f 14 0-1 FWLVGRLLDMDVPTTLKTWTVNQTLIALIGFALSALLFAIV 

30 Furthermore, ORF140ng-l is homologous to an Kcoli protein: 

gi [882633 (U29579) ORF__o454 [Escherichia coli] >gi 11789097 (AE000358) o454; 
This 454 aa ORF is 34% identical (9 gaps) to 444 residues of an approx. 456 aa 
protein GNTP_BACLI SW: P46832 [Escherichia coli] Length - 454 
Score - 210 bits (529), Expect = le-53 
35 Identities - 130/384 (33%), Positives - 194/384 (49%), Gaps - 19/384 (4%) 

ETSGGAQSLAD7VLIRMFGEKRAPFAPGVASLIFGFPIFFDAGLIVMLPIVFATARRMKQD 147 
E SGGA+SLA+ R G+KR A +A+ G P+FFD G I++ PI++ A+ K 
EHSGGAESLANYFSRKLGDKRTIAALTLAAFFLGIPVFFDVGFI ILAPI IYGFAKVAKIS 139 



40 



60 



Query: 


88 


Sbjct: 


80 


Query: 


148 


Sbjct: 


140 


Query: 


208 


Sbjct: 


199 


Query: 


258 


Sbjct: 


256 


Query: 


318 


Sbjct: 


313 


Query: 


378 


Sbjct: 


371 


Query: 


438 


Sbjct: 


431 



L F L G +HV +PPHPGP+AA+ A+IG + I+G+ +1 GY K 



45 Query: 208 VLGRAIHVPVPELL SGGTQDS DPPKEPAKAGTWAVMLI PMLLI FLNTGV 257 

++ + + E+L G T+ SD P A V ++++IP+ +1 T 



50 +S L+ + T ++IGS +RG S + AL 

VSATLMPPSHPLLGTLQLIGSPMVALMIALVLAFWLLALRRGWSLQHTSDIMGSALP 312 

PACSVI LI TGAGGM FGGVLRASGIGKALADSMADLGI PVLLGCFLVALALRI AQG SXXXX 377 
A VIL+TGAGG+FG VL SG+GKALA+ + + +P+L F+++LALR +QGS 
55 Sbjct: 313 TAAVVILVTGAGGVFGKVLVESGVGECALANMLQMIDLPLLPAAFI I SLALRASQGS — AT 370 



+ LA G +G SH NDSGFW+V + L + V LK 



TWTV T++ F GF ++ ++A++ 



Based on this analysis, including the identification of the presence of a putative leader sequence 
65 (double-underlined) and several putative transmembrane domains (single-underlined) in the 
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gonococcal protein, it is predicted that the proteins from Kmeningitidis and K gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 71 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 593>: 

1 . .GATTTCGGCA TATCGCCCGT GTATCTTTGG GTTGCCGCCG CGTTCAAACA 
51 TTTGCTGTCG CCGTGGGCTG CCGACTCATA CGATGTCGCA CGCTTTGCAG 
101 GCGTATTTTT TGCCGTTATC GGACTGACTT CCTGCGGCTT TGCCGGTTTC 
151 AACTTTTTGG GCAGACACCA CGGGCGCAC. GTCGTCCTGA TTCTCATCGG 
201 CTGTATCGGG CTGATTCCAG TTGCCCATTT CCTCAACCCC GCTGCCGCCG 
251 CCTTTGCCGC CGCCGGACTG GTGCTGCACG GTTATTCTTT GGCTCGCCGG 
301 CGCGTGATTG CCGCCTCTTT TCTGCTCGGT ACGGGCTGGA CGCTGATGTC 
351 GTTGGCAGCA GCTTATCCGG CAGCATTTGC CCTGATGCTG CCCTTGCCCG 
401 TACTGATGTT TTTCCGTCCG . . 

This corresponds to the amino acid sequence <SEQ ID 594; ORF141>: 

1 ..DHJISPVYLW VAAAFKHLLS PWAADSYDVA RFAGVFFAVI GLTSCGFAGF 
51 NFLGRHHGRX WLILIGCIG LIPVAHFLNP AAAAFAAAGL VLHGYSLARR 
101 RVIAASFLLG TGWTLMSLAA AYPAAFALML PLPVLMFFRP . . , 

Further work revealed the complete nucleotide sequence <SEQ ID 595>: 

1 ATGCTGACCT ATACCCCGCC CGATGCCCGC CCGCCCGCCA AAACCCACGA 

51 AAAGCCGTGG CTGCTGCTGT TGATGGCGTT TGCCTGGTTG TGGCCCGGCG 

101 TGTTTTCCCA CGATTTGTGG AATCCTGACG AACCTGCCQT CTATACCGCC 

151 GTCGAAGCAC TGGCAGGCAG CCCCACCCCC TTGGTTGCCC ATCTGTTCGG 

201 TCAAACCGAT TTCGGCATAC CGCCCGTGTA TCTTTGGGTT GCCGCCGCGT 

251 TCAAACATTT GCTGTCGCCG TGGGCTGCCG ACTCATACGA TGCCGCACGC 

301 TTTGCAGGCG TATTTTTTGC CGTTATCGGA CTGACTTCCT GCGGCTTTGC 

351 CGGTTTCAAC TTTTTGGGCA GACACCACGG GCGCAgCGTC GTCCTGATTC 

401 TCATCGGCTG TATCGGGCTG ATTCCAGTTG CCCATTTCCT CAACCCCGCT 

451 GCCGCCGCCT TTGCCGCCGC CGGACTGGTG CTGCACGGTT ATTCTTTGGC 

501 TCGCCGGCGC GTGATTGCCG CCTCTTTTCT GCTCGGTACG GGCTGGACGC 

551 TGATGTCGTT GGCAGCAGCT TATCCGGCAG CATTTGCCCT GATGCTGCCC 

601 TTGCCCGTAC TGATGTTTTT CCGTCCGTGG CAAAGCAGGC GTTTGATGTT 

651 GACGGCAGTC GCCTCACTTG CCTTTGCCCT GCCGCTTATG ACCGTTTACC 

701 CGCTGCTCTT GGCAAAAACG CAGCCCGCGC TGTTCGCGCA ATGGCTCGAC 

751 TATCACGTTT TCGGTACGTT CGGCGGCGTG CGGCACGTTC AGACGGCATT 

801 CAGTTTGTTT TACTATCTGA AAAACCTGCT TTGGTTTGCA TTGCCCGCGC 

851 TGCCGCTGGC GGTTTGGACG GTTTGCCGCA CGCGCCTGTT TTCGACCGAC 

901 TGGGGGATTT TGGGCGTCGT CTGGATGCTT GCCGTTTTGG TGCTGCTTGC 

951 CGTCAATCCG CAGCGTTTTC AGGATAACCT CGTCTGGCTG CTTCCGCCGC 

1001 TTGCCCTGTT CGGCGCGGCG CAACTGGACA GCCTGAGGCG CGGCGCGGCG 

1051 GCGTTTGTCA ACTGGTTCGG CATTATGGCG TTCGGACTGT TTGCCGTGTT 

1101 CCTGTGGACG GGCTTTTTCG CCATGAATTA CGGCTGGCCC GCCAAGCTTG 

1151 CCGAACGCGC CGCCTATTTC AGCCCGTATT ATGTTCCTGA TATCGATCCC 

1201 ATTCCGATGG CGGTTGCCGT ACTGTTCACA CCCTTGTGGC TGTGGGCGAT 

1251 TACCCGGAAA AACATACGCG GCAGGCAGGC GGTTACCAAC TGGGCGGCAG 

1301 GCGTTACCCT GACCTGGGCT TTGCTGATGA CGCTGTTCCT GCCGTGGCTG 

1351 GACGCGGCGA AAAGCCACGC GCCGGTCGTC CGGAGTATGG AGGCATCGCT 

1401 TTCCCCGGAA TTGAAACGGG AGCTTTCAGA CGGCATCGAG TGTATCGGCA 

1451 TAGGCGGCGG CGACCTGCAC ACGCGGATTG TTTGGACGCA GTACGGCACA 

1501 TTGCCGCACC GCGTCGGCGA TGTACAATGC CGCTACCGCA TCGTCCTCCT 

1551 GCCCCAAAAT GCGGATGCGC CGCAAGGCTG GCAGACGGTT TGGCAGGGTG 

1601 CGCGTCCGCG CAACAAAGAC AGTAAGTTCG CACTGATACG GAAAATCGGG 

1651 GAAAATATAT AA 

This corresponds to the amino acid sequence <SEQ ID 596; ORF141-l>: 

1 MLTYTPPDAR PPAKTHEKPW LLLLMAFAWL WPGVFS HDLW NPDEPAVYTA 

51 VEALAGSPTP LVAHLFGQTD FGIPPVYLWV AAAFKHLLSP WAADSYDAAR 

101 FAGVFFAVIG LTSCGFAG FN FLGRHHGRS V VLILIGCIGL I PVAHF LNPA 

151 AAAFAAAGLV LHGYSLARRR VIAASFLLGT GWTLMSL AAA YPAAFALMLP 

201 LPVLMFF RPW QSRR LMLTAV ASLAFALPLM TVY PLLLAKT QPALFAQWLD 
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251 YHVFGTFGGV RHVQTAFSLF YYLKNLLWFA LPALPLAVWT VCRTRLFSTD 

301 WGILGWWML AVLVLLAVN P QRFQDNLVWL LPPLALFGAA QLDSLRRGAA 

351 AFVNWFGIMA FGLFAVFLWT GFFAMNYGWP AKLAERAAYF SPYYVPDIDP 

401 IPMAVAVLFT PLWLWAI TRK NIRGRQAVTN WAAGVTLTWA LLMTLFL PWL 

451 DAAKSHAPW RSMEASLSPE LKRELSDGIE CIGIGGGDLH TRIVWTQYGT 

501 LPHRVGDVQC RYRIVLLPQN ADAPQGWQTV WQGARPRNKD SKFALIRKIG 

551 ENI* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF141 shows 95.0% identity over a 140aa overlap with an ORF (ORF141a) from strain A of AT. 
meningitidis: 

10 20 30 

orf 141 . pep DFG I S PVYLWVAAAFKHLLSPWAADSYDVA 

MM MMMMMMMMMM 1 1 : 1 
orfl41a WN PDEPAVYTAVEALAGSPTPLVAHLFGQI DFGI PPVYLWVAAAFKHLLS PWAADPYDAA 

40 50 60 70 80 90 



40 50 60 70 80 90 

or f 1 4 1 . pep RFAGVFFAVIGLTS CGFA GFNFLGRHHGRX WLI L IGC IGLI PVAH F LNPAAAAFAAAGL 
MMMMIMMMMMMMMMM lllllllllllll::|||||IMIIIIIII 
orf 1 4 la RFAGVFFAWGLTS CGFA G FNFLGRHHGRS WL ILIGCIGLI PTVH F LNPAAAAFAAAGL 

100 110 120 130 140 150 



100 110 120 130 140 

orf 141 - pep VLHGYSLARRR VIAASFLLGTGWTLMSLA A AYPAAFALMLPLPVLMFFR P 
IIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIII 
orf 1 4 la VLHGYSLARR RVIAASFLLGTGWTLMSLA A AYPAAFALMLPLPVLMFF RPWQSRR LMLTA 
160 170 180 190 200 210 

orf 141a VAS LAFAL PLMTV YPLLLAKTQPAL FAQWLD DHVFGT FGGVRH I QTAFSL FYYLKNLLWF 

220 230 240 250 260 270 

The complete length ORF141a nucleotide sequence <SEQ ID 597> is: 

1 ATGCTGACCT ATACCCCGCC CGATGCCCGC CCGCCCGCCA AAACCCACGA 

51 AAAGCCGTGG CTGTTGCTGT TGATGGCGTT TGCCTGGTTG TGGCCCGGCG 

101 TGTTTTCCCA CGATTTGTGG AATCCTGACG AACCTGCCGT CTATACCGCC 

151 GTCGAAGCAC TGGCAGGCAG CCCCACCCCT TTGGTTGCCC ATCTGTTCGG 

201 TCAAATCGAT TTCGGCATAC CGCCCGTGTA TCTTTGGGTT GCCGCCGCGT 

251 TCAAACATTT GCTGTCGCCG TGGGCTGCCG ACCCGTATGA TGCCGCACGC 

301 TTTGCCGGCG TGTTTTTCGC CGTTGTCGGA CTGACTTCCT GCGGCTTTGC 

351 CGGTTTCAAC TTTTTGGGCA GACACCACGG GCGCAGCGTC GTCCTGATTC 

401 TCATCGGCTG TATCGGGCTG ATTCCGACCG TACACTTTCT CAACCCCGCT 

451 GCCGCCGCCT TTGCCGCCGC CGGACTGGTG CTGCACGGTT ATTCTTTGGC 

501 TCGCCGGCGC GTGATTGCCG CCTCTTTTCT GCTCGGTACG GGTTGGACGC 

551 TGATGTCGTT GGCAGCAGCT TATCCGGCGG CATTTGCCCT GATGCTGCCC 

601 CTGCCCGTGC TGATGTTTTT CCGTCCGTGG CAAAGCAGGC GTTTGATGTT 

651 GACGGCAGTC GCCTCGCTTG CCTTTGCCCT GCCGCTTATG ACCGTTTACC 

701 CGCTGCTCTT GGCAAAAACG CAGCCCGCGC TGTTCGCGCA ATGGCTCGAC 

751 GATCACGTTT TCGGTACGTT CGGCGGCGTG CGGCACATTC AGACGGCATT 

801 CAGTTTGTTT TACTATCTGA AAAACCTGCT TTGGTTTGCA TTGCCTGCGC 

851 TGCCGCTGGC GGTTTGGACG GTTTGCCGCA CGCGCCTGTT TTCGACCGAC 

901 TGGGGGATTT TGGGCGTCGT CTGGATGCTT GCCGTTTTGG TGCTGCTTGC 

951 CGTCAATCCG CAGCGTTTTC AGGATAACCT CGTCTGGCTG CTTCCGCCGC 

1001 TTGCCCTGTT CGGCGCGGCG CAACTGGACA feCCTGAGACG CGGCGCGGCG 

1051 GCGTTTGTCA ACTGGTTCGG CATTATGGCG TTCGGACTGT TTGCCGTGTT 

1101 CCTGTGGACG GGCTTTTTCG CCATGAATTA CGGCTGGCCC GCCAAGCTTG 

1151 CCGAACGCGC CGCCTATTTC AGCCCGTATT ATGTTCCTGA TATCGATCCC 

1201 ATTCCGATGG CGGTTGCCGT ACTGTTCACA CCCTTGTGGC TGTGGGCGAT 

1251 TACCCGCAAA AACATACGCG GCAGGCAGGC GGTTACCAAC TGGGCGGCAG 

1301 GCGTTACCCT GACCTGGGCT TTGCTGATGA CGCTGTTCCT GCCGTGGCTG 

1351 GACGCGGCGA AAAGCCACGC GCCCGTCGTC CGGAGTATGG AGGCATCGCT 

1401 TTCCCCGGAA TTAAAACGGG AGCTTTCAGA CGGCATCGAG TGTATCGACA 

1451 TAGGCGGCGG CGACCTACAC ACGCGGATTG TTTGGACGCA GTACGGCACA 

1501 TTGCCGCACC GCGTCGGCGA TGTACAATGC CGCTACCGCA TCGTCCGCTT 

1551 GCCCCAAAAC GCGGATGCGC CGCAAGGCTG GCAGACGGTC TGGCAGGGTG 



WO 99/24578 



-340- 



PCT/IB98/01665 



1601 CGCGCCCGCG CAACAAAGAC AGTAAGTTCG CACTGATACG GAAAACCGGG 
1651 GAAAATATAT TAAAAACAAC AGATTGA 

This encodes a protein having amino acid sequence <SEQ ID 598>: 



1 MLTYTPPDAR PPAKTHEKPW LLLLMAFAWL WPGVFS HDLW NPDEPAVYTA 

51 VEALAGSPTP LVAHLFGQID FGIPPVYLWV AAAFKHLLSP WAADPYDAAR 

101 FAGVFFAWG LTSCGFA GFN FLGRHHGRS V VLILIGCIGL IPTVHF LNPA 

151 AAAFAAAGLV LHGYSLARRR VIAASFLLGT GWTLMSL AAA YPAAFALMLP 

201 LPVLMFFRPW QSRR LMLTAV ASLAFALPLM TVY PLLLAKT QPALFAQWLD 

251 DHVFGTFGGV RHIQTAFSLF YYLKNLLWFA LPALPLAVWT VCRTRLFSTD 

301 WGILGWWML AVLVLLAVN P QRFQDNLVWL LPPLALFGAA QLDSLRRGAA 

351 AFVNWFGIMA FGLFAVFLWT GFFAMNYGWP AKLAERAAYF SPYYVPDIDP 

401 IPMAVAVLFT PLWLWAIT RK NIRGRQAVTN WAAGVTLTWA LLMTLFL PWL 

451 DAAKSHAPW RSMEASLSPE LKRELSDGIE CIDIGGGDLH TRIVWTQYGT 

501 LPHRVGDVQC RYRIVRLPQN ADAPQGWQTV WQGARPRNKD SKFALIRKTG 

551 ENILKTTD* 

ORF141a and ORF141-1 show 98.2% identity in 553 aa overlap: 



orf 141a . pep MLTYTPPDARPPAKTHEKPWLLLLMAFAWLWPGVFSHDLWNPDEPAVYTAVEALAGSPTP 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I II II I I 
orf 141-1 MLTYTPPDARPPAKTHEKPWLLLLMAFAWLWPGVFSHDLWNPDEPAVYTAVEALAGSPTP 



orf 141a . pep LVAHLFGQI DFGI PPV YLWVAAAFKHLL S PWAAD P YDAARFAGVFFAWGLT S CGFAG FN 
1 I I I I I I I I I I I I I 1 I I I I I I I I I I I I I II I I I I I I I I II I I I I I I : I I III I I I I II I 
orf 141-1 LVAHLFGQTDFGI PPVYLWVAAAFKHLLS PWAADS YDAARFAGVFFAVIGLTSCGFAGFN 

orf 14 la. pep FLGRHHGRS WL ILIGCIGLI PTVHFLN PAAAAFAAAGLVLHGY S LARRRVI AAS FLLGT 
MM IN Ml II I IN III IM::MI Mill I Ml IIMI III II I I t MM I IMII I 
orf 141-1 FLGRHHGRSWL I LIGCIGLIPVAHFLN PAAAAFAAAGLVLHGY SLARRRVI AAS FLLGT 

or f 1 4 la . pep GWTLMSIJVAAYPAAFALMLPLPVLMFFRPWQSRRLMLTAVASLAF 

I II I I M I I I II I I I I I II I I I I II M I I I I I I I II II II I I II II I I I I I I II I I I I I I 
O r f 1 4 1 - 1 GWTLMSLAAAYPAAFALMLPLPVI^FFRPWQSRRLMLTAVASLAFALPLMTVYPLLLAKT 



orf 141a . pep QPALFAQWLDDHVFGTFGGVRHIQTAFSLFYYLKNLLWFALPALPLAVWT VCRTRLFSTD 
IMMM III I I 1 1 I I I I I I 1 r | | | 1 | | 1 | | | f I | | 1 | | | | I | | | | | I | | | | | 1 | t 1 1 I 
orf 141-1 QPALFAQWLDYHVFGTFGGVRHVQTAFSLFYYLKNLLWFALPALPLAVWTVCRTRLFSTD 



orf 141a .pep WGILGVVWMIAVLVLLAVNPQRFQDNLVWLLPPLALFGAAQLDSLRRGAAAFVNWFGIMA 
1 1 1 I 1 1 1 I M I M 1 1 1 1 1 1 1 1 1 1 M I II I M 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 I I I I II I I II II I I 
orf 141-1 WG I LGWWMLAVLVLLAVN PQRFQDNLVWLLP PLALFGAAQLDSLRRGAAAFVNW FG IMA 



or f 14 la . pep FGLFAVFLWTG FFAMN YGWPAKLAERAAY FS PYYVPD I DPI PMAVAVLFT PLWLWAI TRK 
IMIIMIM MUM IM III I III III Mill) Mill I MMM I II I Mill I III 
or f 1 4 1 - 1 FGLFAVFLWTGFFAMNYGWPAKLAERAAYFS PYYVPDIDPI PMAVAVLFT PLWLWAITRK 

orf 141a. pep N IRGRQAVTNWAAGVT LTWALLMTLFLPWLDAAKSHAPWRSMEAS LS PE LKRE LS DGIE 

I II I I I I I I I I I I I I I II M I I I I I I M I I I I I I I I I I II I I I I II I I I M I II M I I II 
or f 1 4 1- 1 NIRGRQAVTNWAAGVTLTWALLMTLFLPWLDAAKSHAPWRSMEASLSPELKRELSDGIE 

orf 141a . pep CIDIGGGDLHTRIVWTQYGTLPHRVGDVQCRYRIVRLPQNADAPQGWQTVWQGARPRNKD 

II II I M M M II II II II II 11 M I M II II I I I I I I I I I I I I I II M I II I II I I I 
or f 1 4 1- 1 CIGIGGGDLHTRIVWTQYGTLPHRVGDVQCRYRI VLLPQNADAPQGWQTVWQGARPRNKD 



orf 141a. pep SKFALIRKTGENI 
II I II I II MM 
orf 141-1 SKFALIRKIGENI 



Homology with a predicted ORF from N. gonorrhoeae 

ORF141 shows 95% identity over a 140aa overlap with a predicted ORF (ORF141ng) from 
N. gonorrhoeae: 

orf 141. pep DFGI S PVYLWVAAAFKHLLS PWAADS YDVA 30 

I II I t I I I 1 1 I I I I I 1 I 1 I I I I I 11:1 
orfl41ng WNPAEPAVYTAVEALAGS PT PLVAHLFGQTDFGI PPVYLWVAAAFKHLLS PWAAHPYDAA 126 
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^Tf14l oeo RFAGVFFAVIGLTSCGFAGFNFLGRHHGRXWLILIGCIGLIPVAHFLNPAAAAFAAAGL 90 
orfl41.pep Y™™,,,,,,,,,, ,,,,,,,,,, | nil I 1 1 I I I I I I I I I : I I I 1 1 1 1 1 1 II I 

RFAGVFFAVIGLTSCGFAGFNFLGRHHGRSWLIHIGCIGLIPVAHFFNPAAAAFAAAGL 186 



VLHGYSLARRRVIAASFLLGTGWTLMSLAAAYPAAFALMLPLPVLMFFRP 140 
II IN ill II Mi II I I II I M I INN II (III MUM II I II It I I I 

VXHGYSIJ^RRVIAASFLLGTGWTLMSIA^YPAAFALMLPLPVLMFFRPWQSRRLMLTA 246 



orfl41ng 
or f 141. pep 
orfl41ng 

An ORF1 41ng nucleotide sequence <SEQ ID 599> was predicted to encode a protein having amino 
acid sequence <SEQ ID 600>: 

1 MPSEAVSARP LCEYLLHLAI RPFLLTLWLT YTPPDARPPA KTHEKPWLLL 

51 LMAFAWLWPG VFS HDLWNPA EPAVYTAVEA LAGSPTPLVA HLFGQTDFGI 

X01 PPVYLWVAAA FKHLLSPWAA HPYDAA RFAG VFFAVIGLTS CGFAG FNFLG 

151 RHHGRS WLI HIGCIGLIPV AHF FNPAAAA FAAAGLVLHG YSLARRRVIA 

201 ASFLLGTGWT LMSLA A AYPA AFALMLPLPV LMFF RPWQSR RL MLTAVASL 

251 AFALPLMTV Y PLLLAKTQPA LFAQWLNYHV FGTFGGVRHI QRAFSLFHYL 

301 KNLLWFAPPG LPLAVWTVCR TRLFSTDW GI LGIVWMLAVL VLLAFNPQRF 

351 QDNLVWLLPP LALFGAAQLD SLRRGAAAFV NWFG IMAFGL FAVFLWTGFF 

401 AMNYGWPAKL AERAAYFSPY YVPDIDP IPM AVAVLFTPLW LWAIT RKNIR 

451 GRQAVT NWAA GVTLTWALLM TLFL PWLDAA KSHAPWRSM EASFSPELKR 

501 ELSDGIECIG IGGGDLHTRI VWTQYGTLPH RVGDVRCRYR IVRLPQNADA 

551 PQGWQTVWQG ARPRNKDSKF ALIRKIGENI LKTTD* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 601>: 

1 ATGCTGACCT ATACCCCGCC CGATGCCCGC CCGCCCGCCA AAACCCACGA 
51 AAAACCGTGG CTGCTGCTGT TGATGGCGTT TGCCTGGCTG TGGCCCGGCG 
101 TGTTTTCCCA CGATTTGTGG AATCCTGCCG AACCTGCCGT CTATACCGCC 
151 GTCGAAGCAC TGGCAGGCAG CCCCACCCCC TTGGTTGCCC ATCTGTTCGG 
201 TCAAACCGAT TTCGGCATAC CGCCCGTGTA TCTTTGGGTT GCCGCCGCAT 
251 TCAAACATTT GCTGTCGCCG TGGGCAGCCG ACCCGTATGA TGCCGCACGC 
301 TTTGCAGGCG TATTTTTTGC CGTTATCGGA CTGACTTCTT GCGGCTTTGC 
351 CGGTTTCAAC TTTTTGGGCA GACACCACGG GCGCAGCGTT GTTTTAATCC 
401 ATATCGGCTG TATCGGGCTG ATTCCGGTTG CCCATTTCCT CAATCCcgcc 
451 gccgccgcct tTGCCGCCGC CGGACTGGTG CTGCacggct actcgctgGC 
501 ACGCCGGCGC GTGATtgccg cctctTtccT GCTCGGTACG GGTTGGACGT 
551 TGATGTCGCT GGCGGCAGCT TATCCGGCGG CGTTTGCGCT GATGCTGCCC 
601 CTGCCCGTGC TGATGTTTTT CCGTCCGTGG CAAAGCAGGC GTTTGATGTT 
651 GACGGCAGTC GCCTCGCTTG CCTTTGCCCT GCCGCTTATG ACCGTTTACC 
701 CGCTGCTCtt gGCAAAAACG CAGCCCGCGC TGTTTGCGCA ATGGCTCAAC 
751 TATCACGTTT TCGGTACGTt cggcgGCGTG CGGCAcaTTC AGAggGCatT 
801 Cagtttgttt cactatctgA AAaatctgct ttggttcgca ccgcccgggC 
851 TGCCGCTGGC GGTTTGGACG GTTTGCCGCA CACGCCTGTT TTCGACCGAC 
901 TGGGGGATTT TGGGCATTGT CTGGATGCTT GCCGTTTTGG TGCTGCTCGC 
951 CTTTAATCCG CAGCGTTTTC AAGACAACCT CGTCTGGCTG CTGCCGCCGC 
1001 TTGCCCTGTT CGGCGCGGCG CAACTGGACA GCCTGAGGCG CGGCGCGGCG 
1051 GCTTTTGTCA ACTGGTTCGG CATTATGGCG TTCGGGCTGT TTGCCGTGTT 
1101 CCTGTGGACG GGCTTTTTCG CCATGAATTA CGGCTGGCCC GCCAAGCTTG 
1151 CCGAACGCGC CGCCTACTTC AGCCCGTATT ACGTTCCCGA CATCGATCCC 
1201 ATTCCGATGG CGGTTGCCGT ACTGTTCACA CCCTTGTGGC TGTGGGCGAT 
1251 TACCCGGAAA AACATACGCG GCAGGCAGGC GGTTACCAAC TGGGCGGCAG 
1301 GCGTTACCCT GACCTGGGCT TTGCTGATGA CGCTGTTCCT GCCGTGGCTG 
1351 GACGCGGCGA AAAGCCACGC GCCCGTCGTC CGGAGTATGG AGGCATCGTT 
1401 TTCCCCGGAA TTAAAACGGG AGCTTTCAGA CGGCATCGAG TGTATCGGCA 
1451 TAGGCGGCGG CGACCTGCAC ACGCGGATTG TTTGGACGCA GTACGGCACA 
1501 TTGCCGCACC GCGTCGGCGA TGTCCGTTGC CGCTACCGTA TCGTCCGCCT 
1551 GCCCCAAAAC GCGGATGCGC CGCAAGGCTG GCAGACGGTC TGGCAGGGTG 
1601 CGCGCCCGCG CAACAAAGAC AGTAAGTTTG CACTGATACG GAAAATCGGG 
1651 GAAAATATAT TAAAAACAAC AGATTGA 

This corresponds to the amino acid sequence <SEQ ID 602; ORF141ng-l>: 

l M LTYTPPDAR PPAKTHEKPW LLLLMAFAWL WPGVFS HDLW NPAEPAVYTA 

51 VEALAGSPTP LVAHLFGQTD FGIPPVYLWV AAAFKHLLSP WAADPYDAAR 

101 FAGVFFAVIG LTSCGFAG FN FLGRHHGRS V VLIHIGCIGL IPVAHF LNPA 

151 AAAFAAAGLV LHGYSLARRR VIAASFLLGT GWTLMSLA AA YPAAFALMLP 

201 LPVLMFFRPW QSRR LMLTAV ASLAFALPLM TV YPLLLAKT QPALFAQWLN 

251 YHVFGTFGGV RHIQRAFSLF HYLKNLLWFA PPGLPLAVWT VCRTRLFSTD 

301 WGILGIVWML AVLVLLAFNP QRFQDNLVWL LPPLALFGAA QLDSLRRGAA 
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351 AFVNWFGIMA FGLFAVFLWT GFFAM NYGWP AKLAERAAYF SPYYVPDIDP 
401 IPMAVAVLFT PLWLWAI TRK NIRGRQAVTN WAAGVTLTWA LLMTLFL PWL 
451 DAAKSHAPW RSMEASFSPE LKRELSDGIE CIGIGGGDLH TRIVWTQYGT 
501 LPHRVGDVRC RYRIVRLPQN ADAPQGWQTV WQGARPRNKD SKFALIRKIG 
551 ENILKTTD* 

ORF141ng-l and ORF141-1 show 97.5% identity in 553 aa overlap: 

orf 141ng-l .pep MLTYTPPDARPPAKTHEKPWLLLLMAFAWLWPGVFSHDLWNPAEPAV^TAVEALAGSPTP 

I i 1 1 1 1 1 1 1 1 1 f 1 1 1 1 i 1 1 1 1 1 1 1 1 f 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 i i M 1 1 1 1 1 1 1 

orf 141-1 MLTYTPPDARPPAKTHEKPWLLLLMAFAWLWPGVFSHDLWNPDEPAVYTAVEALAGSPTP 

or f 1 4 lng- 1 . pep LVAHLFGQTDFGI PPVYLWVAAAFKHLLS PWAADP YDAARFAGVFFAVI GLT SCG FAGFN 

M I I I I 1 1 1 I I I I I I I I I 1 1 I I I II I I II 1 1 I | | || | | | | | 1 1 1 1 1 1 1 1 1 1 1 | | 

orfl41-l LVAHLFGQTDFGI PPVYLWVAAAFKHLLSPWAADSYDAARFAGVFFAVIGLTSCGFAG FN 

orf 141ng-l .pep FLGRHHGRS WLI H IGCIGLI PVAH FLN PAAAAFAAAGLVLHG YSLARRRVI AAS FLLGT 
I I I I I I N M I M I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I | I I I I | | | | | | | | | | 
orf 14 1-1 FLGRHHGRSWLILIGCIGLIPVAHFLNPAAAAFAAAGLVLHGYSLARRRVIAASFLLGT 

orf 141ng-l .pep GWTLMSIAAAYPAAFALMLPLPVLMFFRPWQSRRLMLT^ 

I I I t t I 1 I I I I I I I I I I I 1 I I | | i | | | | | | | | | I t 1 1 J I t I I I I f f I | t I I I I I I I I I I J 
orf 14 1-1 GWTI^SU^YPAAFAI^PLPVmFFRPWQSRRLMLTAVASIAFALPIKTVYPLLL^ 

orf 141ng-l .pep QPALFAQWLNYHVFGTFGGVRHIQRAFSLFHYLKNLLWFAPPGLPLAVWTVCRTRLFSTD 
Mil IN 11:11 I III I II I I I: I 11111:111111111 I : I I I I I I I I 1 1| I I I I I | I 
orf 14 1-1 QPAL FAQWLDYHVFGT FGGVRHVQTAFS LFYYLKNLLWFALPAL PLAVWT VCRTRLFST D 

orf 14 lng-1 . pep WGILGIVWMIJVVLVLI^FNPQRFQDNLVWLLPPLALFGAAQLDSLRRGAAAFVNWFGIMA 
11111:11111111111 I I I I | | | | | | | | | | | | | | M I I I I I I I I I I I I I I I I I I | | I | 
orf 14 1-1 WGILG WWMLAVLVLLAVN PQRFQDNLVWLLPPLALFGAAQLDS LRRGAAAFVNWFGIMA | 

orf 14 lng-1 .pep FGLFAVFLWTGFFAMNYGWPAKLAERAAYFS PYYVPDI DPI PMAVAVLFTPLWLWAITRK I 

I I t I 1 I I I I I I i I I I I I I I I I I 1 I I I I t I 1 I I I I I I I J J I I 1 I I I I I I I 1 I f I I I I I I I J 
orf 141-1 FGLFAVFLWTGFFAMNYGWPAKLAERAAYFS PYYVPDI DPI PMAVAVL FT PLWLWAITRK 

orf 14 lng-1 .pep NIRGRQAVTNWAAGVTLTWALLMTLFLPWLDAAKSHAPVVRSMEASFSPELKRELSDGIE 
ill! II I II I II I II I I MM I II I II I I II II I I II ill I III II: II 111 I II Ml I I 
orf 141^1 NIRGRQAVTNWAAGVTLTWALLMTLFLPWLDAAKSHAPWRSMEASLSPELKRELSDGIE 

orf 14 lng-1 .pep CIGIGGGDLHTRIVWTQYGTLPHRVGDVRCRYRIVRLPQNADAPQGWQTVWQGARPRNKD 
III Mill II IMIMI III III II 111:11 Ml I I Mi III Ml I II III Ml II I II 
or f 1 4 1- 1 CIGIGGGDLHTRI VWTQYGTLPHRVGDVQCRYRIVLLPQNADAPQGWQTVWQGARPRNKD 

orf 14 lng-1. pep SKFALIRKIGENILKTTDX 

J I II I I I I I II I I 
orf 14 1-1 SKFALIRKIGENIX 

Based on the presence of several putative transmembrane domains in the gonococcal protein, it is 
predicted that the proteins from ^meningitidis and N.gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 72 

50 The following partial DNA sequence was identified in N.meningitidis <SEQ ID 603>: 

1 . . CAATCCGCCA AATGGTTATC GGGCCAAACT CTAGTCGGCA CAGCAATTGG 

51 GATACGCGGG CAGATAAAGC TTGGCGGCAA CCTGCATTAC GATATATTTA 

101 CCGGCCGCGC ATTGAAAAAG CCCGAATTTT TCCAATCAAG GAAATGGGCA 

151 AGCGGTTTTC AGGTAGGCTA TACGTTTTAA 

55 This corresponds to the amino acid sequence <SEQ ID 604; ORF142>: 

1 . . QSAKWLSGQT LVGTAIGIRG QIKLGGNLHY DIFTGRALKK PEFFQSRKWA 
51 SGFQVGYTF* 

Further work revealed the complete nucleotide sequence <SEQ ID 605>: 
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1 ATGGATAATT CGGGTAGTGA GGCGACAGGA AAATACCAAG GAAATATCAC 

51 TTTCTCTGCC GACAATCCTT TGGGACTGAG TGATATGTTC TATGTAAATT 

101 ATGGACGTTC GATTGGCGGT ACGCCCGATG AGGAAAGTTT TGACGGCCAT 

151 CGCAAAGAAG GCGGATCAAA CAATTACGCC GTACATTATT CAGCCCCTTT 

201 CGGTAAATGG ACATGGGCAT TCAATCACAA TGGCTACCGT TACCATCAGG 

251 CAGTTTCCGG ATTATCGGAA GTCTATGACT ATAATGGAAA AAGTTACAAT 

301 ACTGATTTCG GCTTCAACCG CCTGTTGTAT CGTGATGCCA AACGCAAAAC 

351 CTATCTCGGT GTAAAACTGT GGATGAGGGA AACAAAAAGT TACATTGATG 

401 ATGCCGAACT GACTGTACAA CGGCGTAAAA CTGCGGGTTG GTTGGCAGAA 

451 CTTTCCCACA AAGAATATAT CGGTCGCAGT ACGGCAGATT TTAAGTTGAA 

501 ATATAAACGC GGCACCGGCA TGAAAGATGC TCTGCGCGCG CCTGAAGAAG 

551 CCTTTGGCGA AGGCACGTCA CGTATGAAAA TTTGGACGGC ATCGGCTGAT 

601 GTAAATACTC CTTTTCAAAT CGGTAAACAG CTATTTGCCT ATGACACATC 

651 CGTTCATGCA CAATGGAACA AAACCCCGCT AACATCGCAA GACAAACTGG 

701 CTATCGGCGG ACACCACACC GTACGTGGCT TCGACGGTGA AATGAGTTTG 

751 TCTGCCGAGC GGGGATGGTA TTGGCGCAAC GATTTGAGCT GGCAATTTAA 

801 ACCAGGCCAT CAGCTTTATC TTGGGGCTGA TGTAGGACAT GTTTCAGGAC 

851 AATCCGCCAA ATGGTTATCG GGCCAAACTC TAGTCGGCAC AGCAATTGGG 

901 ATACGCGGGC AGATAAAGCT TGGCGGCAAC CTGCATTACG ATATATTTAC 

951 CGGCCGCGCA TTGAAAAAGC CCGAATTTTT CCAATCAAGG AAATGGGCAA 

1001 GCGGTTTTCA GGTAGGCTAT ACGTTTTAA 

This corresponds to the amino acid sequence <SEQ ID 606; ORF142-l>: 

1 MDNSGSEATG KYQGNITFSA DNPLGLSDMF YVNYGRSIGG TPDEESFDGH 

51 RKEGGSNNYA VHYSAPFGKW TWAFNHNGYR YHQAVSGLSE VYDYNGKSYN 

101 TDFGFNRLLY RDAKRKTYLG VKLWMRETKS YIDDAELTVQ RRKTAGWLAE 

151 LSHKEYIGRS TADFKLKYKR GTGMKDALRA PEEAFGEGTS RMKIWTASAD 

201 VNTPFQIGKQ LFAYDTSVHA QWNKTPLTSQ DKLAIGGHHT VRGFDGEMSL 

251 SAERGWYWRN DLSWQFKPGH QLYLGADVGH VSGQSAKWLS GQTLVGTAIG 

301 IRGQIKLGGN LHYDI FTGRA LKKPEFFQSR KWASGFQV GY TF * 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. gonorrhoeae 

ORF142 shows 88.1% identity over a 59aa overlap with a predicted ORF (ORF142ng) from 
K gonorrhoeae: 

orf 142 . pep QSAKWLSGQTLVGTAIG IRGQIKLGGN LHY 30 

I I I II I I I I I I : I I I ! I 1 I I I I I I I I I M 1 
orfl42ng RGWYWRNDLSWQFKPGHQLYLGADVGHVSGQSAKWLSGQTLAGTAIGIRGQIKLGGNLHY 313 

orf 142 .pep DIFTGRALKKPEFFQSRKWASGFQVGYTF 59 

I I I I I II I I I II : I I : : I I : : I I I I I I : I 
orfl42ng DIFTGRALKKPE YFQTKKWVTGFQVGYS F 342 

The complete length ORF142ng nucleotide sequence <SEQ ID 607> is: 

1 ATGGATAATT CGGGTAGTGA GGCGACAGGA AAATACCAAG GAAATATCAC 

51 TTTCTCTGCC GACAATCCTT TTGGACTGAG TGATATGTTC TATGTAAATT 

101 ATGGACGTTC AATTGGCGGT ACGCCCGATG AGGAAAATTT TGACGGCCAT 

151 CGCAAAGAAG GCGGATCAAA CAATTACGCC GTACATTATT CAGCCCCTTT 

201 CGGTAAATGG ACATGGGCAT TCAATCACAA TGGCTACCGT TACCATCAGG 

251 CGGTTTCCGG ATTATCGGAA GTCTATGACT ATAATGGAAA AAGTTACAAC 

301 ACTGATTTCG GCTTCAACCG CCTGTTGTAT CGTGATGCCA AACGCAAAAC 

351 CTATCTCAGT GTAAAACTGT GGACGAGGGA AACAAAAAGT TACATTGATG 

401 ATGCCGAACT GACTGTACAA CGGCGTAAAA CCACAGGTTG GTTGGCAGAA 

451 CTTTCCCACA AAGGATATAT CGGTCGCAGT ACGGCAGATT TTAAGTTGAA 

501 ATATAAACAC GGCACCGGCA TGAAAGATGC TCTGCGCGCG CCTGAAGAAG 

551 CCTTTGGCGA AGGCACGTCA CGTATGAAAA TTTGGACGGC ATCGGCTGAT 

601 GTAAATACTC CTTTTCAAAT CGGTAAACAG CTATTTGCCT ATGACACATC 

651 CGTTCATGCA CAATGGAACA AAACCCCGCT AACATCGCAA GACAAACTGG 

701 CTATCGGCGG ACACCACACC GTACGTGGCT TCGACGGTGA AATGAGTTTG 

751 CCTGCCGAGC GGGGATGGTA TTGGCGCAAC GATTTGAGCT GGCAATTTAA 

801 ACCAGGCCAT CAGCTTTATC TTGGGGCTGA TGTAGGACAT GTTTCAGGAC 

851 AATCCGCCAA ATGGTTATCG GGCCAAACTC TAGCCGGCAC AGCAATTGGG 

901 ATACGCGGGC AGATAAAGCT TGGCGGCAAC CTGCATTACG ATATATTTAC 

951 CGGCCGTGCA TTGAAAAAGC CCGAATATTT TCAGACGAAG AAATGGGTAA 
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10O1 CGGGGTTTCA GGTGGGTTAT TCGTTTTGA 

This encodes a protein having amino acid sequence <SEQ ID 608>: 

1 MDNSGSEATG KYQGNITFSA DNPFGLSDMF YVNYGRSIGG TPDEENFDGH 

51 RKEGGSNNYA VHYSAPFGKW TWAFNHNGYR YHQAVSGLSE VYDYNGKSYN 

5 101 TDFGFNRLLY RDAKRKTYLS VKLWTRETKS YIDDAELTVQ RRKTTGWLAE 

151 LSHKGYIGRS TADFKLKYKH GTGMKDALRA PEEAFGEGTS RMKIWTASAD 

201 VNTPFQIGKQ LFAYDTSVHA QWNKTPLTSQ DKLAIGGHHT VRGFDGEMSL 

251 PAERGWYWRN DLSWQFKPGH QLYLGADVGH VSGQSAKWLS GQTLAGTAIG 

301 IRGQIKLGGN LHYDIFTGRA LKKPEYFQTK KWVTGFQVG Y SF + 

10 The underlined sequence (aromatic-Xaa-aromatic amino acid motif) is usually found at the 
C-tenninal end of outer membrane proteins. 



ORF142ng and ORF142-1 show 95.6% identity over 342aa overlap: 

orf 142-1 .pep MDNSGSEATGKYQGNITFSADNPLGLSDMFYVNYGRSIGGTPDEESFDGHRKEGGSNNYA 
M I I I I I I I I I I I I I I I I I I I i I : I I I II I I I I I I I I I I I I It I I : I I I I I I I I I I I I I I 
15 orfl42ng-l MDNSGSEATGKYQGNITFSADNPFGLSDMFYVNYGRSIGGTPDEENFDGHRKEGGSNNYA 

orf 1 42-1 . pep VHYSAPFGKWTWAFNHNGYRYHQAVSGLSEVYDYNGKSYNTDFGFNRLLYRDAKRKTYLG 
! I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I II I I I I I I I I 1 ,1 I I I I : 
orfl42ng-l VHYSAPFGKWTWAFNHNGYRYHQAVSGLSEVYDYNGKSYNTDFGFNRLLYRDAKRKTYLS 
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40 



60 



orf 142-1 .pep VKLWMRETKSYIDDAELTVQRRKTAGWLAELSHKEYIGRSTADFKLKYKRGTGMKDALRA 
I I I I I I I I II I II I I II II I I II : I I I I I II I I I I I 1 I I I I I I I I M : I I I I I I I II I 
orf!42ng-l VKLWTRETKSYIDDAELTVQRRKTTGWLAELSHKGYIGRSTADFKLKYKHGTGMKDALRA 



25 orf 1 42-1 . pep PEEAFGEGTSRMKIWTASADVNTPFQIGKQLFAYDTSVHAQWNKTPLTSQDKLAIGGHHT 

1 I I I I I I I I I I 1 I I I I t I I I I I I I I I I t 1 t I I I I I 1 I I I I 1 t I t i I I I I I I 1 I I I i 1 1 I I 
orfl42ng-l PEEAFGEGTSRMKIWTASADVNTPFQIGKQLFAYDTSVHAQWNKTPLTSQDKLAIGGHHT 

orf 142-1 .pep VRGFDGEMSLSAERGWYWRNDLSWQFKPGHQLYLGADVGHVSGQSAKWLSGQTLVGTAIG 
30 ill II MM I I I I I I ! I M I I I I I I M II I I M i i I I M I I M I I I I I I II I I : I I M I 

orfl42ng-l VRGFDGEMSLPAERGWYWRNDLSWQFKPGHQLYLGADVGHVSGQSAKWLSGQTLAGTAIG 

orf 142-1 .pep IRGQIKLGGNLHYDIFTGRALKKPEFFQSRKWASGFQVGYTF 
I II I I M M I I ! I I I I I M I i M I I : M : : I I : : I I II I I : t 
35 orfl42ng-l IRGQIKLGGNLHYDIFTGRALKKPEYFQTKKWVTGFQVGYSF 

In addition, ORF142ng is homologous to the HecB protein of E.chrysanthemi: 

gi 11772622 (L39897) HecB [Erwinia chrysanthemi] Length = 558 
Score = 119 bits (295) , Expect - 3e-26 

Identities = 88/346 (25%) , Positives = 151/346 (43%), Gaps = 22/346 (6%) 



Query: 2 DNSGSEATGKYQGNITFSADNPFGLSDMFYVNYGRSIGGTPDEENFDGHRKEGGSNNYAV 61 

DNSG ++TG+ Q N + + DN FGL+D ++++ G S + + D + G 
Sbjct: 230 DN SGQKS TGEEQLNGS LALDNV FGLADQWFI S AGHS SRFATSHDAESLQAG — 280 



45 Query: 62 HYSAPFGKWTWAFNHNGYRYHQAVSGLSEVYDYNGKSYNTDFGFNRLLYRDAKRKTYLSV 121 

+S P+G W +N++ RY + G S F +R+++RD KT ++ 

Sbjct: 281 -FSMPYGYWNLGYNYSQSRYRNTFINRDFPWHSTGDSDTHRFSLSRWFRDGTMKTAIAG 339 

Query: 122 KLWTRETKSYIDDAELTVQRRKTTGWLAELSHKGYIGRSTADFKLKYKHGTGMKDALRAP 181 
50 R +Y++ + L RK + ++H + A F Y G + 

Sbjct: 340 TFSQRTGNNYLNGSLLPSSSRKLSSVSLGVNHSQKLWGGLATFNPTYNRGVRWLGSETDT 399 

Query: 182 EEAFGEGTSRMKIWTASADVNTPFQIGKQLFAYDTSVHAQWNKTPLTSQDKLAIGGHHTV 241 
+++ E + WT SA P Y S++ Q++ L ++L +GG ++ 

55 Sbjct: 400 DKSADEPRAEFNKWTLSASYYHPV TDS I T YLGSLYGQYSARALYGSEQLTLGGES S I 456 



Query: 242 RGFDGEMSLPAERGWYWRNDLSWQFKP GHQLYLGA-DVGHVSGQSAKWLSGQTLAG 296 

RGF E RG YWRN+L+WQ G+ ++ A D GH+ + +L G 

Sbjct: 457 RGF-REQYTSGNRGAYWRNELNWQAWQL PVLGNVT FMAAVDGGHLYNHKQDN STAAS LWG 515 

Query: 297 TAIG IRGQIKLGGN LHYDIFTGRALKKPEYFQTKKWVTGFQVGYSF 342 
A+G+ + L + G + P + Q V G++VG SF 
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Sbjct: 516 GAVGMTVASRW LSQQVTVGWPISYPAWLQPDTMWGYRVGLSF 558 

On the basis of this analysis, it is predicted that the proteins from Kmeningitidis and 
Kgonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

Example 73 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 609>: 

1 ATGCGGACGA AATGGTCAGC AGTGAGAAGC TGCJTACTTG GgCGGACACC 

51 GCCGACATCG ATACCGCTTT GAACCTGTTG TACCGTTTGC AAAAACTCGA 

101 ATTCCTCTAT GGCGATGAAA ACGGTCATTC AGACGGCATC AATTTGwCGG 

151 ACGAGCAATT GCCGTTGCTG ATGGAACAAT TGTCCGGCAG CGGTAAGGCG 

201 TTATTGGTCG ATCGGAACGG TCTGTATCTT GCCAACGCCA ATTTCCATCA 

251 TGAGGCGGCG GAAGAGTTGG GGTTGTTGGC GGCAGAAGTC GCACAGATGG 

301 AAAAGAAATA CCGGCTGCTG ATTAAGAACA AC. 

This corresponds to the amino acid sequence <SEQ ID 610; ORF143>: 

1 MRTKWSAVRS C2WAOTADID TALNLLYRLQ KLEFLYGDEN GHSDGINLXD 
51 EQLPLLMEQL SGSGKALLVD RNGLYLANAN FHHEAAEELG LLAAEVAQME 
101 KKYRLLIKNN .. 

Further work revealed the complete nucleotide sequence <SEQ ID 61 1>: 

1 ATGGAATCAA CACTTTCACT ACAAGCAAAT TTATATCCCC GCCTGACTCC 

51 TGCCGGTGCA TTTTATGCCG TATCCAGCGA TGCCCCCAGT GCCGGTAAAA 

101 CTTTGTTGCA CAGCCTGTTG AAAGCAGATG CGGACGAAAT GGTCAGCAGT 

151 GAGAAGCTGC TTACTTGGGC GGACACCGCC GACATCGATA CCGCTTTGAA 

201 CCTGTTGTAC CGTTTGCAAA AACTCGAATT CCTCTATGGC GATGAAAACG 

251 GTCATTCAGA CGGCATCAAT TTGTCGGACG AGCAATTGCC GTTGCTGATG 

301 GAACAATTGT CCGGCAGCGG TAAGGCGTTA TTGGTCGATC GGAACGGTCT 

351 GTATCTTGCC AACGCCAATT TCCATCATGA GGCGGCGGAA GAGTTGGGGT 

401 TGTTGGCGGC AGAAGTCGCA CAGATGGAAA AGAAATACCG GCTGCTGATT 

451 AAGAACAACC TGTATATCAA CAATAACGCT TGGGGCGTTT GCGATCCTTC 

501 CGGTCAGAGC GAATTGACAT TTTTCCCATT GTATATCGGT TCAACCAAAT 

551 TTATTTTGGT TATCGGCGGC ATTCCCGATT TGGGCAAAGA GGCATTTGTT 

601 ACTTTGGTAA GGATTTTATA CCGCCGTTAC AGCAACCGCG TGTAA 

This corresponds to the amino acid sequence <SEQ ID 612; ORF143-l>: 

1 MESTLSLQAN LYPRLTPAGA FYAVSSDAPS AGKTLLHSLL KADADEMVSS 

51 EKLLTWAOTA DIDTALNLLY RLQKLEFLYG DENGHSDGIN LSDEQLPLLM 

101 EQLSGSGKAL LVDRNGLYLA NANFHHEAAE ELGLLAAEVA QMEKKYRLLI 

151 KNNLYINNNA WGVCDPSGQS ELT FFPLYIG STKFILVIGG IPDLGKEAFV 

201 TLVRILYRRY SNRV* 

Computer analysis of this amino acid sequence gave the following results: 
TTnmnlnp y with a predicted ORF from K men ingitidis (strain A) 

ORF143 shows 92.4% identity over a 105aa overlap with an ORF (ORF143a) from strain A of K 
meningitidis: 



orfl43.pep 
orfl43a 



10 20 30 

MRT KW S AVR S CT W ADT AD I DT ALN LLYRLQKLE FL 
|: : 111 I II Mill I II Ml MUM 
GAFYAVSS DXP SAGKTLLHSLLKADADEMVS SEKLLTWAXTAD I DTAI#NLLYRLQKLEFL 
20 30 40 50 60 70 



40 50 60 70 80 90 

or f 14 3 oeo YGDENGHSDGINLXDEQLPLLMEQLSGSGKALLVDRNGLYLANANFHHEAAEELGLLAAE 
P P | 1 | 1 I 1 I I I I I I 1 I ! I I I t I 1 I I I I 1 I 1 i I 1 I I i I I I I I I i I I I 1 M 1 1 I I I I I 1 1 1 I I 
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orfl43a YGDENGHSDGINLSDEQLPLLMEQLSGSGKALLVDRNGLYLANANFHHEAAEELGLLAAE 
80 90 100 110 120 130 

100 . 110 
orf 143 .pep VAQMEKKYRLL IKNN 
I I I I I I 1 I 1 1 Mil 

orf 143a VAQMEKKYRLXIKNNLYINNNAWGVCDPSGQSELT FFPLYIGSTKFILVIGG IPDLGKEA 
140 150 160 170 180 190 

The complete length ORF143a nucleotide sequence <SEQ ID 613> is: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 



ATGGAATCAA 
TGCCGGTGCA 
CTTTGTTGCA 
GAGAAGCTGC 
CCTGTTGTAC 
GTCATTCAGA 
GAACAATTGT 
GTATCTTGCC 
TGTTGGCGGC 
AAGAACAACC 
CGGTCAGAGC 
TTATTTTGGT 
ACTTTGGTAA 
TGGGAGAGAG 



CANTTTCACT 
TTTTATGCCG 
CAGCCTGTTG 
TTACCTGGGC 
CGTTTGCAAA 
CGGCATCAAT 
CCGGCAGCGG 
AACGCCAATT 
AGAAGTCGCA 
TGTATATCAA 
GAATTGACAT 
TATCGGCGGC 
GGATNTTATA 
GANGGGTTAT 



ACAAGCAAAT 
TATCCAGCGA 
AAAGCGGATG 
GGANACCGCC 
AACTCGAATT 
TTGTCGGACG 
TAAGGCGTTA 
TCCATCATGA 
CAGATGGAAA 
CAATAACGCT 
TTTTCCCATT 
ATTCCCGATT 
CCNCCNGTTA 
GCAGCAATTA 



TTATATCNCC 
TGNCCCCAGT 
CGGACGAAAT 
GACATCGATA 
CCTCTATGGC 
AGCAATTGCC 
TTGGTCGATC 
GGCGGCGGAA 
AGAAATACCG 
TGGGGCGTTT 
GTATATCGGT 
TGGGCAAAGA 
CAGCAACCGC 
TTGA 



GCCTGACTCC 
GCCGGTAAAA 
GGTNAGCAGT 
CCGCTTTGAA 
GATGAAAACG 
GTTGCTGATG 
GGAACGGTCT 
GAGTTGGGGT 
GCTGCNNATT 
GCGATCCTTC 
TCAACCAAAT 
GGCATTTGTT 
GTGTAAAACT 



This encodes a protein having amino acid sequence <SEQ ID 614>: 

1 MESTXSLQAN LYXRLTPAGA FYAVSSDXPS AGKTLLHSLL KADADEMVSS 

51 EKLLTWAXTA DIDTALNLLY RLQKLEFLYG DENGHSDGIN LSDEQLPLLM 

101 EQLSGSGKAL LVDRNGLYLA NANFHHEAAE ELGLLAAEVA QMEKKYRLXI I 

151 KNNLYINNNA WGVCDPSGQS ELT FFPLYIG STKFILVIGG IPDLGKEAFV 

201 TLVRXLYXXL QQPRVKLGRE XGLCSNY* | 

ORF143a and ORF143-1 show 97.1% identity in 207 aa overlap: 

orf 14 3a . pep MESTXSLQANLYXRLTPAGAFYAVSS DXPSAGKTLLHSLLKADADEMVS SEKLLTWAXTA 
INI I I I t I I I I I I I I I I I I I I i I I i I I I i I I I I I I I I I I I I I I I | I I | | | | | | || 
orf 143-1 ME STLS LQANLYPRLT PAGAFYAVSS DAPSAGKTLLHS LLKADADEMVSSEKLLTWADTA 

orf 143a - pep DIDTALNLLYRLQKLEFLYGDENGHSDGINLSDEQLPLLMEQLSGSGKALLVDRNGLYLA 
I I I I I I I II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I II I I I I I I I I I M I | | | | | | | 
orf 14 3-1 DIDTALNLLYRLQKLEFLYGDENGHSDGINLSDEQLPLLMEQLSGSGKALLVDRNGLYLA 

orf 143a. pep NANFHHEAAEELGLLAAEVAQMEKKYRLXIKNNLYINNNAWGVCDPSGQSELTFFPLYIG 
IMIIIMIIIMIIIIIMIIIMIII MINIMUM! I I I I I I I I I I I I I 1 I I I I 
orf 143-1 NANFHHEAAEELGLLAAEVAQMEKKYRLLIKNNLYINNNAWGVCDPSGQSELTFFPLYIG 

orfl43a.pep STKFILVIGG I PDLGKE AFVTLVRXLY 
I II I I I I I I I I I II I I I I I I II I I I I 
orf 143-1 STKFILVIGGIPDLGKEAFVTLVRILY 

Homology with a predicted ORF from N. gonorrhoeae 

ORF143 shows 95.5% identity over a HOaa overlap with a predicted ORF (ORF143ng) from 
N. gonorrhoeae: 



orf 14 3. pep 
orfl43ng 
orfl43.pep 
orfl43ng 



MRTKWSAVRSCTWADTADIDTALNLLYRLQKLEFLYGDENGHSDGINLXDEQLPLLMEQL 60 
IIMillllM: I I I I I I 1 I K I I I I I I I I t I I I 1 I I t I I I 1 1 K 1 I I I MIIIUMM 

MRTKWSAVRSCSRADTADIDTALNLLYRLQKLEFLYGDENGHSDGINLSDEQLPLLMEQL 60 

SGSGBCALLVDRNGLYIANANFHHEAAEELGLLAAEVAQMEKKYRLLIKNN 110 
MM MM IMMM Ml Mill MIIIMI M M I M II Ml IMMI I 

SGSGKALLVDRNGLYLANANFHHESAEELGLLAAEVAQMEKKYRLLIRNNLYINNNAWGV 120 



An ORF143ng nucleotide sequence <SEQ ID 615> was predicted to encode a protein having amino 
acid sequence <SEQ ID 616>: 
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1 MRTKWSAVRS CSRADTADID TALNLLYRLQ KLEFLYGDEN GHSDGINLSD 

51 SSKhbE SGSGKALLVD RNGLYLANAN FHHESAEELG LLAAEVAQME 

1HT kSSjS LYINNNAWGV CDPSGQSELT FFPLYIGSTK FILVIAGIPD 

III SS KDFIPPLQQP RVKLGTGGIM RQLLISILED LNNTSTDIIA 

2oJ S matmlpsSn sdrvgaisat llalgsrsvq elacgeleqv 

251 MIKGKSGYIL LSQAGKDAVL VLVAKETGRL GLILLDAKRA ARHIAEAI* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 617>: 

1 ATGGAATCAA CACTTTCACT ACAAGCGAAT TTATATCCCT GCCTGACTCC 
SI TGCCGGTGCA TTTTATGCCG TATCCAGCGA TGCCCCCAGT GCCGGTAAAA 

ill SSSgcg cagcctgttg aaagcggatg cggacgaagt ggtcagcagt 
III gagaagctgc tcgcggcgga caccgccgac atcgataccg ctttgaacct 

901 GTTGTACCGT TTGCAAAAAC TCGAATTCCT CTATGGCGAT GAAAACGGTC 

III SSacgg catcaatttg tcggacgagc aattgccgtt GCTGATGGAA 

301 SSScS GCAGCGGTAA GGCATTATTG GTCGATCGGA ACGGTCTGTA 
^1 TCTTGCCAAC GCCAATTTCC ATCATGAGTC GGCGGAAGAG TTGGGGTTGT 

III JggcggS agtcgcacag atggaaaaga aataccggct gctgattagg 

III aSScCTGT ATATCAACAA TAACGCTTGG GGCGTTTGCG ATCCTTCCGG 

til ?Sgcgaa ttgacatttt tcccattgta tatcggttca accaaattta 

III TTTTGGTTAT CGCCGGCATT CCCGATTTGA GCAAAGAGGC ATTTGTTACT 
601 TTGGTAAGGA TTTTATACCG CCGTTACAGC AACCGCGTGT AA 

This corresponds to the amino acid sequence <SEQ ID 618; ORF143ng-l>: 

1 MESTLSLQAN LYPCLTPAGA FYAVSSDAPS AGKTLLRSLL KADADEWSS 

il EaaK IDTALNLLYR LQKLEFLYGD ENGHSDGINL SDEQLPLIME 

101 QLSGSGKALL VDRNGLYLAN ANFHHESAEE LGLLAAEVAQ ^KKYWXIR 

III NNLYINNNAW GVCDPSGQSE T.TFFPLYIGS TKFILVIAGI PDLSKEAFVT 

201 LVRILYRRYS NRV* 

ORF143ng-l and ORF143-1 show 95.8% identity in 214 aa overlap: 

««^TTTinoMrTVTIi 1 1 Q 



orfi43-i tl^ii^R^ 120 

orf!43-l NANFTOEAAEEI/SLLAAEVAQMEK^R^ 180 



AO orfl43ng-l.pep STKFILVIAGIPDLSKEAFVTLVRILYRRYSNRV 213 

orfl43-l STKFILVIGGIPDLGKEAFVTLVRILYRRYSNRV 214 

Based on the presence of the putative transmembrane domains in the gonococcal protein, it 
predicted that the proteins from N.meningitidis and ^.gonorrhoeae, and their epitopes, could 
45 useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 74 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 619>: 

1 ATGACCTTTT TACAACGTTT GCAAGGTTTG GCAGACAATA AAATCTGTGC 

51 GTTTGCATGG TTCGTCGTCC GCCGCTTTGA TGAAGAACGC GTACCGCAGr 

cfi 101 CGGCGGCAAG CATGACGTTT ACGACGCTGC TGGCACTCGT CCCCGTGCTG 

50 \ll accGTGATGG TGGCGGTCGC TTCGATTTTC CCCGTGTTCG ACCGCTGGTC 

111 SS GTCTCCTTCG TCAACCAAAC CATTGTGCCG CA.GGCGCGG 

III ACATGGTGTT C6ACTATATC AATGCGTTCC GCGAGCAGGC GAACCGGCTG 

301 ACGGCAATCG GCAGCGTGAT GCTGGTCGTT ACCTCGCTGA TGCTGATTCG 

55 351 GACGATAGAC AATACGTTCA ACCGCATCTG GaCGGGTCAA wTyCCAGCGT 

""' 401 CCGTGGATG. . 
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This corresponds to the amino acid sequence <SEQ ID 620; ORF144>: 

1 MTFLQRLQGL ADNKICAFAW FWRRFDEER VPQXAASMTF TTLLALVPVL 
51 TVMVAVASIF PVFDRWSDSF VSFVNQTIVP XGADMVFDYI NAFREQANRL 
101 TAIGSVMLW TSLMLIRTID NTFNRIWRVX XQRPWM. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 621>: 

. 1 ATGACCTTTT TACAACGTTT GCAAGGTTTG GCAGACAATA AAATCTGTGC 

51 GTTTGCATGG TTCGTCGTCC GCCGCTTTGA TGAAGAACGC GTACCGCAGG 

101 CGGCGGCAAG CATGACGTTT ACGACGCTGC TGGCACTCGT CCCCGTGCTG 

151 ACCGTGATGG TGGCGGTCGC TTCGATTTTC CCCGTGTTCG ACCGCTGGTC 

201 GGATTCGTTC GTCTCCTTCG TCAACCAAAC CATTGTGCCG CAGGGCGCGG 

251 ACATGGTGTT CGACTATATC AATGCGTTCC GCGAGCAGGC GAACCGGCTG 

301 ACGGCAATCG GCAGCGTGAT GCTGGTCGTT ACCTCGCTGA TGCTGATTCG 

351 GACGATAGAC AATACGTTCA ACCGCATCTG GCGGGTCAAT TCCCAGCGTC 

401 CGTGGATGAT GCAGTTTCTC GTCTATTGGG CTTTACTGAC GTTCGGGCCG 

451 CTGTCTTTGG GCGTGGG CAT TTCCTTTATG GTCGGCTCGG TACAGGATGC 

501 CGCGCTTGCC TCAGGTGCGC CGCAGTGGTC GGGCGCGTTG CGAACGGCGG 

551 CGACGCTGAC CTTCATGACG CTTTTGCTGT GGGGGCTGTA CCGCTTCGTG 

601 CCAAACCGCT TCGTTCCCGC GCGGCAGGCG TTTGTCGGGG CTTTGGCAAC 

651 AGCGTTTTGT CTGGAAACCG CGCGCTCCCT CTTCACTTGG TATATGGGCA 

701 ATTTCGACGG CTACCGCTCG ATTTACGGCG CGTTTGCCGC CGTGCCGTTT 

751 TTTCTGTTGT GGCTGAACCT GTTGTGGACG CTGGTCTTGG GCGGCGCGGT 

801 GCTGACTTCT TCACTCTCCT ACTGGCAGGG AGAAGCGTTC CGCAGGGGCT 

851 TCGACTCGCG CGGACGGTTT GACGACGTGT TGAAAATCCT GCTGCTTCTG ! 

901 GATGCGGCGC AAAAAGAAGG CAAAGCCTTG CCTGTTCAGG AGTTCAGACG 

951 GCATATCAAT ATGGGCTACG ACGAGTTGGG CGAGCTTTTG GAAAAGCTGG 

1001 CGCGGCACGG CTACATCTAT TCCGGCAGAC AGGGTTGGGT GTTGAAAACG 

1051 GGGGCGGATT CGATTGAGTT GAACGAACTC TTCAAGCTCT TCGTTTACCG j 

1101 TCCGTTGCCT GTGGAAAGGG ATCATGTGAA CCAAGCTGTC GATGCGGTAA ' 

1151 TGACACCGTG TTTGCAGACT TTGAACATGA CGCTGGCAGA GTTTGACGCT 

1201 CAGGCGAAAA AACGGCAGTA G I 

This corresponds to the amino acid sequence <SEQ ID 622; ORF144-l>: 

1 MTFLQRLQGL ADNKICAFA W FWRRFDEER VPQAAASMTF TT LLALVPVL 
51 TVMVAVASI F PVFDRWSDSF VSFVNQTIVP QGADMVFDYI NAFREQANRL 
101 TAIGSVMLW TSLMLI RTID NTFNRIWRVN SQRPWMMQFL VYW ALLTFGP 
151 LSLGVGISFM V GSVQDAALA SGAPQWSGAL RTAATLTFMT LLLWGLYRFV 
201 PNRFVPARQA FVGALATAFC LETARSLFTW YMGNFDGYRS IYGAFAAVPF 
251 FLLWLNLLWT LVL GGAVLTS SLSYWQGEAF RRGFDSRGRF DDVLKILLLL 
301 DAAQKEGKAL PVQEFRRHIN MGYDELGELL EKLARHGYIY SGRQGWVLKT 
351 GADSIELNEL FKLFVYRPLP VERDHVNQAV DAVMTPCLQT LNMTLAEFDA 
401 QAKKRQ* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.meninzitidis fstrain A) 

ORF144 shows 96.3% identity over a 136aa overlap with an ORF (ORF144a) from strain A of AT. 
meningitidis: 

10 20 30 40 . 50 60 

orf 14 4 . pep MT FLQRLQGLADNKI CAFAW FWRRFDEERVPQXAASMTFTT LLALVPVLTVMVAVAS I F 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I | | I | 
orf 14 4a MTFLQRLQGLADNKICAFAW FWRRFDEERVPQAAASMTFTT LLALVPVLTVMVAVASI F 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 144 . pep PVFDRWSDSFVSFVNQTIVPXGADMVFDYINAFREQAN RLTAIGSVMLWTSmL IRTID 
I I II I II II I I I I II I II II I I I I I II I I I I I I I I I I II I I I II I I I I I I I II I II I I 
orf 144a PV FDRW S PS FVS FVNQT I VPQGADMVFDY INAFREQAN RLTAI GS VMLWT SXML IRT I D 

70 80 90 100 110 120 

130 

orf 14 4. pep NTFNRIWRVXXQRPWM 
llll MM I Mill 

or f 14 4a NTFNRIWRVNSQRPWMMQFLVYW ALLTFGPLSLGVGISFXV GSVQDAALASGAPQWSGAL 
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130 140 150 160 

The complete length ORF144a nucleotide sequence <SEQ ID 623> is 



170 



180 



10 



15 



20 



25 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 



ATGACCTTTT 
GTTTGCATGG 
CGGCGGCAAG 
ACCGTGATGG 
GGATTCGTTC 
ACATGGTNTT 
ACGGCAATCG 
GACGATAGAC 
CGTGGATGAT 
CTGTCTTTGG 
CGCGCTTGCC 
CGACGCTGAN 
CCAAACCGCT 
AGCGTTCTGT 
ATTTCGACGG 
TTTCTGTTGT 
GCTGACTTCT 
TCGACTCGCG 
GATGCGGCGC 
GCATATCAAT 
CGCGGCACGG 
GGGGCGGATT 
TCCGTTGCCT 
TGATGCCGTG 
CAGGCGAAAA 



TACAACGTTT 
TTCGTCGTCC 
CATGACGTTT 
TGGCGGTCGC 
GTCTCCTTCG 
CGACTATATC 
GCAGCGTGAT 
AATACGTTCA 
GCAGTTTCTC 
GCGTGGGCAT 
TCAGGTGCGC 
CTTCATGACG 
TCGTTCCCGC 
CTGGAAACCG 
CTACCGCTCG 
GGCTGAACCT 
TCACTCTCCT 
CGGACGGTTT 
AAAAAGAAGG 
ATGGGCTACG 
CTACATCTAT 
CGATTGAGTT 
GTGGAAAGGG 
TTTGCAGACT 
AACAGCAGCA 



GCAAGGTTTG 
GCCGCTTTGA 
ACGACACTGC 
TTCGATTTTC 
TCAACCAAAC 
AATGCGTTCC 
GCTGGTCGTT 
ACCGCATCTG 
GTCTATTGGG 
TTCCTTTATN 
CGCAGTGGTC 
CTTTTGCTGT 
GCGGCANGCG 
CGCGTTCCCT 
ATTTACGGNG 
GTTGTGGACG 
ACTGGCAGGG 
GACGACGTGT 
CNAAGCCTTG 
ACGAGTTGGG 
TCCGGCAGAC 
GAACGAACTC 
ATCATGTGAA 
TTGAACATGA 
ATCTTGA 



GCAGACAATA 
TGAAGAACGC 
TGGCACTCGT 
CCCGTGTTCG 
CATTGTGCCG 
GCGAGCAGGC 
ACCTCGCNGA 
GCGGGTCAAT 
CTTTACTGAC 
GTCGGCTCGG 
GGGCGCGTTG 
GGGGGCTGTA 
TTTGTCGGGG 
CTTTACTTGG 
CGTTTGCCGC 
CTGGTCTTGG 
AGAAGCGTTC 
TGAAAATCCT 
CCTGTTCAGG 
CGAGCTTTTG 
AGGGTTGGGT 
TTCAAGCTCT 
CCAAGCTGTC 
CGCTGGCAGA 



AAATCTGTGC 
GTACCGCAGG 
CCCCGTGCTG 
ACCGNTGGTC 
CAGGGCGCGG 
GAACCGGCTG 
TGCTGATTCG 
TCCCAGCGTC 
GTTCGGGCCG 
TACAGGATGC 
CGAACGGCGG 
CCGCTNCGTG 
CTTTGGCAAC 
TATATGGGCA 
CGTGCCGTTT 
GCGGCGCGGT 
CGCAGGGNCT 
GCTGCTTCTG 
AGTTCAGACG 
GAAAAGCTGG 
GTTGAAAACG 
TCGTTTACCG 
GATGCGGTAA 
GTTTGACGCT 



This encodes a protein having amino acid sequence <SEQ ID 624>: 



30 



35 



i 

51 
101 
151 
201 
251 
301 
351 
401 



MTFLQRLQGL ADNKICAFAW FWRRFDEER VPQAAASMTF 
TV MVAVASI F PVFDRWSDSF VSFVNQTIVP QGADMVFDYI 
TAIGSVMLW TSXMLIR TID NTFNRIWRVN SQRPWMMQFL 
LSL GVGISFX V GSVQDAALA SGAPQWSGAL RTAATLXFMT 
PNRFVPARXA FVGALATAFC LETARSLFTW YMGNFDGYRS 
FLLWLNLLWT LVLG GAVLTS SLSYWQGEAF RRXFDSRGRF 
DAAQKEGXAL PVQEFRRHIN MGYDELGELL EKLARHGYIY 
GADSIELNEL FKLFVYRPLP VERDHVNQAV DAVMMPCLQT 
QAKKQQQS* 



TT LLALVPVL 
NAFREQANRL 
VYW ALLTFGP 
LLLWGLYRXV 
IYGAF AAVPF 
DDVLKILLLL 
SGRQGWVLKT 
LNMTLAEFDA 



ORF144a and ORF144-1 show 97.8% identity in 406 aa overlap: 



40 



45 



50 



55 



60 



65 



orf 14 4a. pep 

orfl44-l 

orfl44a.pep 

orfl44-l 

orfl44a.pep 

orfl44-l 

orfl44a.pep 

orfl44-l 

orfl44a.pep 

orfl44-l 

orf 14 4a. pep 

orfl44-l 

orf!44a.pep 

orfl44-l 



MT FLQRLQGLADNKI CAFAWFWRRFDEERVPQAAASMT FTTLLALVPVLTVMVAVAS I F 

I 1 1 1 11 f I 1 K 1 1 I f I I K 1 1 1 1 1 1 1 1 I I I 1 1 1 1 1 1 M 1 1 1 1 1 1 I I I I M 1 1 1 M 1 1 1 1 1 1 I 
MT FLQRLQGLADNKI CAFAWFWRRFDEERVPQAAASMT FTTLLALVPVLTVMVAVAS I F 

PVFDRW S DS FVS FVNQT I VPQGADMVFDY IN AFREQANRLT AI GSVMLWTSXMLIRT I D 

I I i I I I I I i t 1 1 K { I I E 1 I 1 1 1 I I 1 1 I I 1 1 I 1 1 I K I 1 I K I M I I I 1 1 I I 1 I I IIIIIM 
PVFDRWSDSFVSFVNQTIVPQGADMVFDYINAFREQANRLTAIGSVMLVVTSLMLIRTID 

NTFNRIWRVNSQRPWMMQFLVYWALLTFGPLSLGVGISFXVGSVQDAALASGAPQWSGAL 

I I 1 I I 1 1 | I 1 1 1 | t 1 I I I I I I 1 I t I ! 1 I I I I 1 1 1 I I I 1 1 1 1 t ! I I 1 t 1 1 1 I I 1 1 I 1 I i 1 
NTFNRIWRVNSQRPWMMQFLVYWALLTFGPLSLGVGISFMVGSVQDAALASGAPQWSGAL 

RTAATLXFMTLLLWGLYRXVPNRFVPARXAFVGALATAFCLETARSLFTWYMGNFDGYRS 

iillillil M 1 1 M 1 1 1 1 1 1 i M 1 1 M 1 1 1 1 1 M MM I 
RTAATLTFMTLLLWGLYRFVPNRFVPARQAFVGALATAFCLETARSLFTWYMGNFDGYRS 

I YGAFAAVPFFLLWLNLLWTLVLGGAVLT S S LS YWQGEAFRRXFDSRGRFDDVLKI LLLL 
t I | I | I | 1 1 1 I 1 1 I I I I i t 1 1 1 1 I 1 1 I I 1 1 1 I 1 I 1 1 I I I 1 I I MMIillllHMlM 
I YGAFAAVP FFLLWLNLLWTL VLGGAVLT S SLS YWQGE1AFRRG FD SRGR FD D VLKI LLLL 

DAAOKEGXALPVQEFRRHINMGYDELGELLEKLARHGYIYSGRQGWVLKT GADSIELNEL 

i , ill 1 1 nullum u 1 1 n 1 1 in i mill n 1 1 1 1 1 1 1 u 1 1 1 1 1 1 1 

DAAQKEGKALPVQEFRRHINMGYDELGELLEKLARHG Y I YSGRQGWVLKTGAD S I ELNEL 



FKLFVYRPLPVERDHVNQAVD AVMMPCLQT LNMT LAE FDAQAKKQQQS 

llllllllllllllllllltllll I I I I I I 1 1 I I 1 t I I I I I I I * I 
FKLFVYRPLPVERDHVNQAVDAVMTPCLQTLNMT LAEFDAQAKKRQ 



408 



406 
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Homology with a predicted ORF from U gonorrhoeae 

ORF144 shows 91.2% identity, over a 136aa overlap with a predicted ORF (ORF144ng) from 
N. gonorrhoeae: 

orf 14 4 . pep MTFLQRLQGLADNKI CAFAWFWRRFDEERVPQXAASMT FTTLLALVPVLTVMVAVAS I F 60 

Mill II lililtiMlll:|||:|ill!l I I I I I I I ! i I I I I t I i I ! I M M I I I 
orfl44ng MTFLQCWQGSADNKI CAFAWFVIRRFSEERVPQAAASMT FTTLLALVPVLTVMVAVAS I F 60 

orf 144 . pep PVFDRWSDSFVSFVNQTIVPXGADMVFDYINAFREQANRLTAIGSVMLVVTSLMLIRTID 120 

Mlltlilllllilllllil I I I I I I I I I: II I: I III I hi il I M I I I Ml Mllll 
orfl44ng PVFDRWSDSFVSFVNQTIVPQGADMVFDYIDAFRDQANRLTAIGSVMLVVTSLMLIRTID 120 

orf 144. pep NTFNRIWRVXXQRPWM 136 
1:1111111 r I I 1 1 I 

or f 14 4ng NAFNRIWRVNTQRPWMMQFLVYWALLTFGPLSLGVG I SFMVGSVQDSVLS SGAQQWADAL 180 

The complete length ORF144ng nucleotide sequence <SEQ ID 625> is predicted to encode a 
protein having amino acid sequence <SEQ ID 626>: 

1 MTFLQCWQGS ADNKICAFAW FVIRRFSEER VPQAAASMTF TT LLALVPVL . 

51 TVMVAVASI F PVFDRWSDSF VSFVNQTIVP QGADMVFDYI DAFRDQANRL I 

101 TAIGSVMLW TSLMLI RTID NAFNRIWRVN TQRPWMMQFL VYW ALLTFGP 

151 LSLGVGISFM V GSVQDSVLS SGAQQWADAL KTAARLAFMT LLLWGLYRFV 

201 PNRFVPARQA FVGALITAFC LETARFLFTW YMGNFDGYRS IYGAFAAVPF 

251 FLLWLNLLWT LVL GGAVLTS SLSYWQGEAF RRGFDSRGRF DDVLKILLLL 

301 DAAQKEGRTL SVQEFRRHIN MGYDELGELL EKLARYGYIY SGRQGWVLKT j 

351 GADSIELSEL FKLFVYRPLP VERDHVNQAV DAVMTPCLQT LNMTLAEFDA I 

401 QAKKQQQS* ; 

Further work revealed the following gonococcal DNA sequence <SEQ ID 627>: 

1 ATGACCTTTT TACAACGTTG GCAAGGTTTG. GCGGACAATA AAATCTGTGC 

51 ATTTGCATGG TTCGTCATCC GCCGTTTCAG TGAAGAGCGC GTACCGCAGG 

101 CAGCGGCGAG CATGACGTTT ACGACACTGC TGGCACTCGT CCCCGTACTG 

151 ACCGTAATGG TCGCGGTCGC TTCGATTTTC CCCGTGTTCG ACCGCTGGTC 

201 GGATTCGTTC GTCTCCTTCG TCAACCAAAC CATTGTGCCG CAGGGCGCGG 

251 ATATGGTGTT CGACTATATC GACGCATTCC GCGATCAGGC AAACCGGCTG 

301 ACCGCCATCG GCAGCGTGAT GCTGGTCGTA ACCTCGCTGA TGCTGATTCG 

351 GACGATAGAC AATGCGTTCA ACCGCATCTG GCGGGTTAAC ACGCAACGCC 

401 CCTGGATGAT GCAGTTCCTC GTTTATTGGG CGTTGCTGAC TTTCGGGCCT 

451 TTGTCTTTGG GTGTGGGCAT TTCCTTTATG GTCGGGTCGG TTCAAGACTC 

501 CGTACTCTCC TCCGGAGCGC AACAATGGGC GGACGCGTTG AAGACGGCGG 

551 CAAGGCTGGC TTTCATGACG CTTTTGCTGT GGGGGCTGTA CCGCTTCGTG 

601 CCCAACCGCT TCGTGCCCGC CCGGCAGGCG TTTGTCGGAG CTTTGATTAC 

651 GGCATTCTGC CTGGAGACGG CACGTTTCCT GTTCACCTGG TATATGGGCA 

701 ATTTCGACGG CTACCGCTCG ATTTACGGCG CATTTGCCGC CGTGCCGTTT 

751 TTCCTGCTGT GGTTAAACCT GCTGTGGACG CTGGTCTTGG GCGGGGCGGT 

801 GCTGACTTCG TCGCTGTCTT ATTGGCAGGG CGAGGCCTTC CGCAGGGGAT 

851 TCGACTCGCG CGGACGGTTT GACGACGTGT TGAAAATCCT GCTGCTTCTG 

901 GATGCGGCGC AAAAAGAAGG CCGAACCCTG TCCGTTCAGG AGTTCAGACG 

951 GCATATCAAT ATGGGTTACG ATGAATTGGG CGAGCTTTTG GAAAAGCTGG 

1001 CGCGGTACGG CTATATCTAT TCCGGCAGAC AGGGCTGGGT TTTGAAAACG 

1051 GGGGCGGATT CGATTGAGTT GAGCGAACTC TTCAAGCTCT TCGTGTACCG 

1101 CCCGTTGCct gtggaAAGGG ATCATGTGAA CCAAGCTGtc gaTGCGGTAA 

1151 TGAcgccgtG TTTGCAGACT TTGAACATGA CGCTGGCGGA GTTTGACGCT 

1201 CAGgcgAAAA AACAGCAGCA GTCTTGA 

This encodes a variant of ORF144ng, having the amino acid sequence <SEQ ID 628; ORF144ng-l>: 

1 MTFLQRWQGZ, ADNKICAFAW FVIRRFSEER VPQAAASMTF TT LLALVPVL 

51 TVMVAVASI F PVFDRWSDSF VSFVNQTIVP QGADMVFDYI DAFRDQANRL 

101 TAIGSVMLW TSLMLI RTID NAFNRIWRVN TQRPWMMQFL VYW ALLTFGP 

151 LSLGVGISFM VG SVQDSVLS SGAQQWADAL KTAARLAFMT LLLWGLYRFV 

201 PNRFVPARQA FVGALITAFC LETARFLFTW YMGNFDGYRS IYGAF AAVPF 

251 FLLWLNLLWT LVL GGAVLTS SLSYWQGEAF RRGFDSRGRF DDVLKILLLL 

301 DAAQKEGRTL SVQEFRRHIN MGYDELGELL EKLARYGYIY SGRQGWVLKT 
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351 GADSIELSEL FKLEVYRPLP VERDHVNQAV DAVMTPCLQT LNMTLAEFDA 
401 QAKKQQQS* 

ORF144ng-l and ORF144-1 show 94.1% identity in 406 aa overlap: 

orf 14 4ncr-l . pep MTFLQRWQGLADNKICAFAWFVIRRFSEERVPQAAASMT FTTLLALVPVLTVMVAVAS I F 
Mllll llll I II llll llll:lll:IIIIIIM II I I II M I Mil I I Ml II ! M M 
or f 1 4 4-1 MTFLQRLQGLADNK ICAFAW FWRRFDEERVPQAAASMT FTTLLALVPVLTVMVAVAS I F 

orf 144nq-l .pep PVFDRWS DSFVS FVNQT IVPQGADMVFDYI DAFRDQANRLTAIGSVMLWTSLMLIRTI D 
MM I! Ml Mllll! II II Mill Ml II: ilhlMMMM I Mill II III Ml II 
orf 144-1 PVFDRWS DS FVS FVNQT IVPQGADMVFDYINAFREQANRLTAIGSVMLWTSLMLIRTID 

orf 144na-l .pep NAFNRIWRVNTQRPWMMQFLVYWALLT FGPLSLGVG I SFMVGSVQDS VLS SGAQQWADAL 
I : I I II II M : I II M M II II M I II I I II II M M M II I i M I : : M II I lh M 
orf 144-1 NTFNRIWRVNSQRPWMMQFLVYWALLTFGPLSLGVGISFMVGSVQDAALASGAPQWSGAL 

orf 144na-l .pep KTAARLAFMTLLLWGLYRFVPNRFVPARQAFVGALITAFCLETARFLFTWYMGNFDGYRS 
:MI I r ! I E I I I I I I I I 1 1 I 1 I 1 M 1 1 1 I 1 I I f I MIMIIM IIMMIMIMM 

or fl 4 4 - 1 RTAATLTFMTLLLWGLYRFVPNRFVPARQAFVGALATAFCLETARSLFTWYMGNFDGYRS 

or f 14 4na-l . pep IYGAFAAVPFFLLWLNLLWTLVLGGAVLTSSLS YWQGEAFRRGFDSRGRFDDVLKILLLL 
I 1 1 I M 11 II M II II II II I I II I M II II M II I M M II II II I M I I II I M 11 M 
orf 144-1 IYGAFAAVPFFLLWLNLLWTLVLGGAVLTSSLSYWQGEAFRRGFDSRGRFDDVLKILLLL 

orfl44na-l.pep DAAQKEGRTLSVQEFRRHINMGYDELGELLEKLARYGYIYSGRQGWVLKTGADSIELSEL 
|||||||::| II II I II I M M I M 11 II II II I : II M I M II M i I II II II M : II 
orf 1 4 4-1 DAAQKEGKALPVQEFRRHI^GYDELGELI^KLARHGYIYSGRQGWVLKTGADS IELNEL 

orf 144nq-l .pep FKLFVYRPLPVERDHVNQAVDAVMTPCLQTLNMTLAEFDAQAKKQQQS 

| II II II I II II I I I II I I II M I I II I II II II M I II II I II: I 
orf 1 4 4 -1 FKLFVYRPLPVERDHVNQAVDAVMT PCLQTLNMTLAE FDAQAKKRQ 

On this basis of this analysis, including the identification of several putative transmembrane 
domains in the gonococcal protein, it is predicted that the proteins from ^meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



Example 75 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 629>: 

1 ..AGACACGCCC GCCGCATCCG CATCGACACC GCCATCAACC CCGAACTGGA 

51 AGCCCTCGCC GAACACCTCC ACTACCAATG GCAGGGCTTC CTCTGGCTCA 

101 GCACCGATAT GCGTCAGGAA ATTTCCGCCC TCGTCATCCT GCTGCAACGC 

151 ACCCGCCGCA AATGGCTGGA TGCCCACGAA CGCCAACACC TGCGCCAAAG 

201 CCTGCTTGAA ACACGGGAAC ACGGCTGA 

This corresponds to the amino acid sequence <SEQ ID 630; ORF146>: 

1 ..RHARRIRIDT AINPELEALA EHLHYQWQGF LWLSTDMRQE ISALVILLQR 

51 TRRKWLDAHE RQHLRQSLLE TREHG* 

Further work revealed the complete nucleotide sequence <SEQ ID 63 1>: 

1 ATGAACACCT CGCAACGCAA CCGCCTCGTC AGCCGCTGGC TCAACTCCTA 

51 CGAACGCTAC CGCTACCGCC GCCTCATCCA CGCCGTCCGG CTCGGCGGGG 

101 CCGTCCTGTT CGCCACCGCC TCCGCCCGGC TGCTCCACCT CCAACACGGC 

151 GAGTGGATAG GGATGACCGT CTTCGTCGTC CTCGGCATGC TCCAGTTTCA 

201 AGGGGCGATT TACTCCAAGG CGGTGGAACG TATGCTCGGC ACGGTCATCG 

251 GGCTGGGCGC GGGTTTGGGC GTTTTATGGC TGAACCAGCA TTATTTCCAC 

301 GGCAACCTCC TCTTCTACCT CACCGTCGGC ACGGCAAGCG CACTGGCCGG 

351 CTGGGCGGCG GTCGGCAAAA ACGGCTACGT CCCTATGCTG GCAGGGCTGA 

401 CGATGTGTAT GCTCATCGGC GACAACGGCA GCGAATGGCT CGACAGCGGA 

451 CTCATGCGCG CCATGAACGT CCTCATCGGC GCGGCCATCG CCATCGCCGC 
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501 CGCCAAACTG CTGCCGCTGA AATCCACACT GATGTGGCGT TTCATGCTTG 

551 CCGACAACCT GGCCGACTGC AGCAAAATGA TTGCCGAAAT CAGCAACGGC 

601 AGGCGCATGA CCCGCGAACG CCTCGAGGAG AACATGGCGA AAATGCGCCA 

651 AATCAACGCA CGCATGGTCA AAAGCCGCAG CCATCTCGCC GCCACATCGG 

701 GCGAAAGCCG CATCAGCCCC GCCATGATGG AAGCCATGCA GCACGCCCAC 

751 CGTAAAATCG TCAACACCAC CGAGCTGCTC CTGACCACCG CCGCCAAGCT 

801 GCAATCTCCC AAACTCAACG GCAGCGAAAT CCGGCTGCTT GACCGCCACT 

851 TCACACTGCT CCAAACCGAC CTGCAACAAA CCGTCGCCCT TATCAACGGC 

901 AGACACGCCC GCCGCATCCG CATCGACACC GCCATCAACC CCGAACTGGA 

951 AGCCCTCGCC GAACACCTCC ACTACCAATG GCAGGGCTTC CTCTGGCTCA 

1001 GCACCAATAT GCGTCAGGAA ATTTCCGCCC TCGTCATCCT GCTGCAACGC 

1051 ACCCGCCGCA AATGGCTGGA TGCCCACGAA CGCCAACACC TGCGCCAAAG 

1101 CCTGCTTGAA ACACGGGAAC ACGGCTGA 

This corresponds to the amino acid sequence <SEQ ID 632; ORF146-l>: 

1 MNTSQRNRLV SRWLNSYERY RYRRLIHAVR LGGAVLFATA SARLLHLQHG 

51 E WIGMTVFW LGMLQFQGA I YSKAVER MLG TVIGLGAGLG VLWL NOHYFH 

101 GNLLFYLTVG TASALAGWAA VGKNGYVPML AGLTMCMLIG DNGSEWLDSG 

I 51 LMRAMN VLIG AAIAIAAAKL LPLK STLMWR FMLADNLADC SKMIAEISNG 

201 RRMTRERLEE NMAKMRQINA RMVKSRSHLA ATSGESRISP AMMEAMQHAH 

251 RKIVNTTELL LTTAAKLQSP KLNGSEIRLL DRHFTLLQTD LQQTVALING 

301 RHARR1RIDT AINPELEALA EHLHYQWQGF LWLSTNMRQE ISALVILLQR 

351 TRRKWLDAHE RQHLRQSLLE TREHG* 

Computer analysis of this amino acid sequence gave the following results: ' 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF146 shows 98.6% identity over a 74aa overlap with an ORF (ORF146a) from strain A of k 
meningitidis: I 



*,A C 10 20 30 

or f 14 6 . pep RHARRIRIDTAINPELEALAEHLHYQWQGF 
w „ I I I I I I I I I I I I M II I I I I I I I I I I J I I | 

KLNGSEIRLLDRHFTLLQTDLQQTVALINGRHARRIRIDTAINPELEALAEHLHYQWQGF 
280 290 300 310 320 330 



orfl46a 



40 50 60 70 

orf 14 6 . pep LWLSTDMRQEISALVILLQRTRRKWLDAHERQHLRQSLLETREHGX 

INII:!M II tllMIIMMI MM IIMMi II ill || II I: 
or f 14 6a LWLSTNMRQEISALVILLQRTRRKWLDAHERQHLRQSLLETREHSX 
340 350 360 370 

The complete length ORF146a nucleotide sequence <SEQ ID 633> is: 

1 ATGAACACCT CGCAACGCAA CCGCCTCGTC AGCCGCTGGC TCAACTCCTA 

51 CGAACGCTAC CGCTACCGCC GCCTCATCCA CGCCGTCCGG CTCGGCGGGG 

101 CCGTCCTGTT CGCCACCGCC TCCGCCCGGC TGCTCCACCT CCAACACGGC 

151 GAGTGGATAG GGATGACCGT CTTCGTCGTC CTCGGCATGC TCCAGTTTCA 

201 AGGGGCGATT TACTCCAAGG CGGTGGAACG TATGCTCGGC ACGGTCATCG 

251 GGCTGGGCGC GGGTTTGGGC GTTTTATGGC TGAACCAGCA TTATTTCCAC 

301 GGCAACCTCC TCTTCTACCT CACCGTCGGC ACGGCAAGCG CACTGGCCGG 

351 CTGGGCGGCG GTCGGCAAAA ACGGCTACGT CCCTATGCTG GCGGGGCTGA 

401 CGATGTGCAT GCTCATCGGC GACAACGGCA GCGAATGGTT CGACAGCGGC 

451 CTGATGCGCG CGATGAACGT CCTCATCGGC GCGGCCATCG CCATCGCCGC 

501 CGCCAAACTG CTGCCGCTGA AATCCACACT GATGTGGCGT TTCATGCTTG 

551 CCGACAACCT GACCGACTGC AGCAAAATGA TTGCCGAAAT CAGCAACGGC 

601 AGGCGCATGA CCCGCGAACG CCTCGAAGAG AACATGGCGA AAATGCGCCA 

651 AATCAACGCA CGCATGGTCA AAAGCCGCAG CCACCTCGCC GCCACATCGG 

701 GCGAAAGCCG CATCAGCCCC GCCATGATGG AAGCCATGCA GCACGCCCAC 

751 CGTAAAATTG TCAACACCAC CGAGCTGCTC CTGACCACCG CCGCCAAGCT 

801 GCAATCTCCC AAACTCAACG GCAGCGAAAT CCGGCTGCTT GACCGCCACT 

851 TCACACTGCT CCAAACCGAC CTGCAACAAA CCGTCGCCCT TATCAACGGC 

901 AGACACGCCC GCCGCATCCG CATCGACACC GCCATCAACC CCGAACTGGA 

951 AGCCCTCGCC GAACACCTCC ACTACCAATG GCAGGGCTTC CTCTGGCTCA 

1001 GCACCAATAT GCGTCAGGAA ATTTCCGCCC TCGTCATCCT GCTGCAACGC 

1051 ACCCGCCGCA AATGGCTGGA TGCCCACGAA CGCCAACACC TGCGCCAAAG 

1101 CCTGCTTGAA ACACGGGAAC ACAGTTGA 
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This encodes a protein having amino acid sequence <SEQ ID 634>: 



1 MNTSQRNRLV SRWLNSYERY RYRRLIHAVR LGGAVLFATA SARLLHLQHG 

51 EWIGMTVFW LGMLQFQGA I YSKAVE RMLG TVIGLGAGLG VLWL NQHYFH 

101 GNLLFYLTVG TASALAGWAA VGKNGYVPML AGLTMCMLIG DNGSEWFDSG 

151 LMRAM NVLIG AAIAIAAAKL LPL KSTLMWR FMLADNLTDC SKMIAEISNG 

201 RRMTRERLEE NMAKMRQINA RMVKSRSHLA ATSGESRISP AMMEAMQHAH 

251 RKIVNTTELL LTTAAKLQSP KLNGSEIRLL DRHFTLLQTD LQQTVALING 

301 RHARRIRIDT AINPELEALA EHLHYQWQGF LWLSTNMRQE ISALVILLQR 

351 TRRKWLDAHE RQHLRQSLLE TREHS* 

ORF146a and ORF146-1 show 99.5% identity in 374 aa overlap: 



orf 14 6a . pep MNTSQRNRLVSRWLNSYERYRYRRLIHAVRLGGAVLFATASARLLHLQHGEWIGMTVFW 
IIIIIMIIMII IIMIIIIM II I I I Ml II I II I M IMMM II I II MIDI II I 
orf 146-1 MNTSQRNRLVSRWLNSYERYRYRRLIHAVRLGGAVLFATASARLLHLQHGEWIGMTVFW 

or f 1 4 6a . pep LGMLQFQGAI YSKAVERMLGTVIGLGAGLGVLWLNQHYFHGNLLFYLTVGTASALAGWAA 
MIMIII II I II Ml I llllll III M 111 IN II I II II I I 111 II I llllill I II I 
orf 146-1 LGMLQFQGAIYSKAVERMLGTVIGLGAGLGVLWLNQHYFHGNLLFYLTVGTASALAGWAA 

or f 1 4 6a . pep VGKNGYVPMLAGLTMC^LIGDNGSEWFDSGLMRAMNVLIGAAIAIAAAKLLPLKSTLMWR 
INN Mill Ml II I III 111 1111:1 IIMIIII I II Ml I Ml II I III II IIMI I 
orfl46-l VGKNGYVPMIAGLTMCMLIGDNGSEWLDSGL^1RAMNVLIGAAIAIAAAKLLPLKSTLMWR 

or f 1 4 6a . pep FMLADNLTDCSKMIAEISNGRRMTRERLEENMAKMRQINARMVKSRSHLAATSGESRISP 

I MM 11:111 1 I llllll III 1 1 II M M MM 1 1 I Mill M I I II I 1 1 MUM Ml 
orf 14 6-1 FMLADNLADCSKMIAEISNGRRMTRERLEENMAKMRQINARMVKSRSHLAATSGESRISP 

or f 14 6a . pep AMMEAMQHAHRKIVNTTELLLTTAAKLQSPKLNGSEIRLLDRHFTLLQTDLQQTVALING 
1 1 M 1 1 1 1 1 1 1 1 It M 1 1 1 M 1 ! M I I K 1 1 1 1 1 1 1 1 II M 1 1 I! ! I M 1 1 1 1 1 1 1 II 1 1 I 
orf 146-1 AMMEAMQHAHRKIVNTTELLLTTAAKLQSPKLNGSEIRLLDRHFTLLQTDLQQTVALING 

or f 1 4 6a . pep RHARRIRIDTAINPELEALAEHLHYQWQGFLWLSTNMRQEISALVILLQRTRRKWLDAHE 

I I M 1 1 M M I 1 I M 1 1 I I I II M I I I I II I I I M M I II M II I M I M II II I M I II 
or f 1 4 6- 1 RHARRIRI DTAINPELEALAEHLHYQWQG FLWLSTNMRQE I S ALVI LLQRTRRKWLDAHE 

or f 14 6a . pep RQHLRQSLLETREHSX 

M II I I I II M I I I: 
orf 14 6-1 RQHLRQSLLETREHGX 



Homology with a predicted ORF from Kponorrhoeae 

ORF146 shows 97.3% identity over a 75aa overlap with a predicted ORF (ORF146ng) from 
Kgonorrhoeae: 



or f 1 4 6 . pep RHARRIRIDTAINPELEALAEHLHYQWQGF 30 

I I I I I I I I I I II I I II I i I I I I I I II M I I 
orf 14 6ng KLNGSEIRLLDRHFTLLQTDLQQTAALINGRHARRIRIDTAINPELEALAEHLHYQWQGF 364 

or f 1 4 6 . pep LWLSTDMRQEI SALVI LLQRTRRKWLDAHERQHLRQSLLETREHG 7 5 

llllhllllllllll II II I M M I i I! M II II II I II II I I 
o r f 1 4 6ng LWLSTNMRQEISALVI PLQRTRRKWLDAHERQHLRQSLLETREHG 409 

An ORF146ng nucleotide sequence <SEQ ID 635> was predicted to encode a protein having amino 
acid sequence <SEQ ID 636>: 



1 MSGVRFPSPA PIPSTDPPSG SLCFFTFPLQ TASDMNSSQR KRLSGRWLNS 

51 YERYRHRRLI HAVRLGGTVL FATALARLLH LQHGE WIGMT VFWLGMLQF 

101 QGAIYSNAVE R MLGTVIGLG AGLGVLWL NQ HYFHGNLLFY LTIGTASALA 

151 GWAAVGKNGY VPMLAGLTMC MLIGDNGSEW LDSGLMRAMN VLIGAAIAIA 

201 AAKLLPLKST LMWRFMLADN LADCSKMIAE ISNGRRMTRE RLEQNMVKMR 

251 QINARMVKSR SHLAATSGES RISPSMMEAM QHAHRKIVNT TELLLTTAAK 

301 LQSPKLNGSE IRLLDRHFTL LQTDLQQTAA LINGRHARRI RIDTAINPEL 

351 EALAEHLHYQ WQGFLWLSTN MRQEISALVI PLQRTRRKWL DAHERQHLRQ 

401 SLLETREHG* 



Further work revealed the following gonococcal DNA sequence <SEQ ID 637>: 
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10 



15 



20 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 



ATGAACTCCT 
CGAACGCTac 
ccgtCCTGTT 
gAATGGATAG 
AGGCgcgatt 
ggctgGGCGC 
ggcaacCTcc 
ctGGGCGGCG 
CGATGTGCAT 
CTGATGCGCG 
CGCCAAACTG 
CCGACAACCT 
AGGCGTATGA 
AATCAACGCA 
GCGAAAGCCG 
CGCAAAATCG 
GCAATCTCCC 
TCACACTGCT 
AGACACGCCC 
AGCCCTCGCC 
GCACCAATAT 
ACCCGCCGCA 
CCTGCTTGAA 



CGCAACGCAA 
cGCCaccGCC 
CGCCACCGCA 
GGAtgaCCGT 
tActccaacg 
GGGTTTGGgc 
tcttctacct 
GTCGGCAAAA 
gctcatcggc 
CGATGAACGT 
CTGCCGCTGA 
GGCCGACTGC 
CGCGCGAACG 
CGCATGGTCA 
CATCAGCCCC 
TCAACACCAC 
AAACTCAACG 
CCAAACCGAC 
GCCGCATCCG 
GAACACCTCC 
GCGTCAGGAA 
AATGGCTGGA 
ACACGGGAAC 



ACGCCTTTCC 
GCCTCATACA 
CTCGCCCGgc 
CTTCGTCGTC 
cggtgGAacg 
gTTTTATGGC 
gaccatcggc 
acggctacgt 
gACAACGGCA 
CCTCATCGGC 
AATCCACACT 
AGCAAAATGA 
TTTGGAGCAG 
AAAGCCGCAG 
TCCATGATGG 
CGAGCTGCTC 
GCAGCGAAAT 
CTGCAACAAA 
CATCGACACC 
ACTACCAATG 
ATTTCCGCCC 
TGCCCACGAA 
ACGGCTGA 



GgccGCTGGC 
TGCCGTGCGG 
tACTCCACCT 
CTCGGCATGC 
taTGctcggt 
TGAACCAGCA 
acggcaagcg 
ccctatgctg 
GCGAATGGCT 
GCCGCCATCG 
GATGTGGCGT 
TTGCCGAAAT 
AATATGGTCA 
CCACCTCGCC 
AAGCCATGCA 
CTGACCACCG 
CCGGCTGCTC 
CCGCCGCCCT 
GCCATCAACC 
GCAGGGCTTC 
TCGTCATCCT 
CGCCAACACC 



TCAACTCCTA 
CTCGGCggaa 
CCAacacggc 
TCCAGTTCCA 
acggtcatcg 
TTAtttccac 
cactggccgg 
GCGGGGctgA 
CGACAGCGGC 
CCATTGCCGC 
TTCATGCTTG 
CAGCAACGGC 
AAATGCGCCA 
GCCACATCGG 
GCACGCCCAC 
CCGCCAAGCT 
GACCGCCACT 
CATCAACGGC 
CCGAACTGGA 
CTCTGGCTCA 
GCTGCAACGC 
TGCGCCAAAG 



This corresponds to the amino acid sequence <SEQ ID 638; ORF146ng-l>: 



25 



30 



1 MNSSQRKRLS 
51 EWIGMTVFW 



101 GNLLFYLTIG 
151 LMRAMNVLIG 



GRWLNSYERY 
LGMLQFQGAI 



201 RRMTRERLEQ 

251 RKIVNTTELL 

301 RHARRIRIDT 

351 TRRKWLDAHE 



TASALAGWAA 
AAIAIAAAKL 



NMVKMRQINA 
LTTAAKLQSP 
AINPELEALA 
RQHLRQSLLE 



RHRRLIHAVR 
YSNAVER MLG 
VGKNGYVPML 
LPLKSTLMWR 
RMVKSRSHLA 
KLNGSEIRLL 
EHLHYQWQGF 
TREHG* 



LGGTVLFATA 
TVIGLGAGLG 



AGLTMCMLIG 
FMLADNLADC 
ATSGESRISP 
DRHFTLLQTD 
LWLSTNMRQE 



LARLLHLQHG 
VLWL NQHYFH 



DNGSEWLDSG 
SKMIAEISNG 
SMMEAMQHAH 
LQQTAALING 
ISALVILLQR 



ORF146ng-l and ORF146-1 show 96.5% identity in 375 aa overlap 



35 



40 



45 



50 



55 



60 



orfl4 6--l.pep 
orf!4 6ng-l 
orf 146-1. pep 
orf 146ng-l 
orf 146-1. pep 
orf 14 6ng-l 
orf 14 6-1. pep 
orf 146ng-l 
orf 14 6-1. pep 
orfl46ng-l 
orf 14 6-1. pep 
orfl46ng-l 
orf 146-1. pep 
orfl46ng-l 



MNTSQRNRLVSRWLNSYERYRYRRLIHAVRLGGAVLFATASARLLHLQHGEWIGMTVFW 
M:IM:II : I I I I I I I ! I I : I I M I i I i I I t : I I I I I I I I I I I i I | | i | | | | | I I I I 
MNSSQRKRLSGRWLNSYERYRHRRLIHAVRLGGTVLFATALARLLHLQHGEWIGMTVFVV 

LGMLQFQGAIYSKAVERMLGTVIGLGAGLGVLWLNQHYFHGNLLFYLTVGTASALAGWAA 
""llllilllhllllllllMlllltllllMIIMINIf I I I I i: II II I I I I M | 
LGMLQFQGAI Y SNAVERMLGTVI GLGAGLGVLWLNQHYFHGNLL FYLT I GT ASALAGWAA 

VGKNGYVPMLAGLTMCMLIGDNGSEWLDSGIMRAMNVLIGAAIAIAAAKLLPLKSTLMWR 
'JJ 1 '"Nil II I M III! IMI II I III III II I MM II I I II I MUM III Ml I 
VGKNGYVPMLAGLTMCMLIGDNGSEWLDSGLMRAMNVLIGAAIAIAAAKLLPLKSTLMWR 

raLADNLADCSKMIAEISNGRRMTRERLEENMAKMRQINARMVKSRSHLAATSGESRISP 
III I I I I I I I I I I ! I I I I t I I I I I I I j I I : I I r | I I Ml IIIMli inn ||||||||| 

FMLADNIADCSKMIAEI SNGRRMTRERLEQNMVKMRQINARMVKSRSHIJ^ATSGESRI S P 

AMMEAMQHAHRKIVNTTEIJ*LTTAAKLQSPKLNGSEIRLLDRHFTLLQTDLQQTVALING 
:IMIII!IIIIMIIIIIIINIIIIIIIIMIIIMIIMIIIII||IIMI:lll|| 
SMMEAMQHAHRKIVNTTELLLTTAAKLQSPKLNGSEIRIXDRHFTLLQTDLQQTAALING 

RHARR IR I DTAIN PELEALAEHLH YQWQGFLWLSTNMRQE IS ALVI LLQRTRRKWLDAHE 
ll,l| NIIIIIIII!IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIII 
RHARRIRIDTAINPELEALAEHLHYQWQGFLWLSTNMRQEISALVILLQRTRRKWLDAHE 

RQHLRQSLLETREHGX 1 
I I I I I III II I II I II 
RQHLRQSLLETREHGX 



65 



Furthermore, ORF146ng-l shows homology with a hypothetical E.coli protein: 

sp|P33011|YEEA_ECOLI HYPOTHETICAL 40.0 KD PROTEIN IN COBU-SRMC INTERGENIC REGION 
>gi 1 1736674 |gnl | PID|dl016553 (D90838) ORF ID:o348#20; similar to ^ [SwSrot 

Sr^ID^.r^ P3 ^ 11] . CE r S c Cherichia coli]->gi|1736682|gnl|PID|dl01656b (dSS 
ORF_ID:o348#20; similar to [SwissProt Accession Number P33011] [Escherichia coli 
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>gi| 1788318 (AE000292) f352; 100% identical to fragment YEEA_ECOLI SW: P33011 but 
has 203 additional C-terminal residues [Escherichia coli] Length = 352 
Score = 109 bits (271) , Expect = 2e-23 

Identities - 89/347 (25%), Positives - 150/347 (42%) , Gaps - 21/347 (6%) 

Query: 20 YRHRRLIHAVTUjGGTVLFATALARLIJILQ 79 

YRH R++H R+ L + RL + W +T+ V++G + F G + A ER+ 
Sbjct: 15 YRHYRI VHGTRVALAFLLT FLI IRLFT I PE STWPLVTMWIMG PI S FWGNWPRAFERIG 74 

Query: 80 GTVIGLGAGLGVLWLNQH YFHGNLLFYLT I GTASALAGWAAVGKNGYVPMLAGLTMCML I 139 

GTV+G GL L L L + A L GW A+GK Y +L G+T+ +++ 

Sbjct: 75 GTVLG S I LGL I ALQLE LI SLPLMLVWCAAAMFLCGWLALGKKPYQGLLIGVTLAI W 131 

Query: 140 GDNGSEWLDSGLMRAMWLIGXXXXXXXXKLLPI^ 199 

G E +D+ L R+ +V++G + P ++ + WR LA +L + +++ + 

Sbjct: 132 GSPTGE-IDTALWRSGDVILGSLLAMLFTGIWPQRAFIHWRIQLAKSLTEYNRVYQSAFS 190 

Query: 200 GRRMTRERLEQNMVKMRQINARMVKSRSHLAATSGESRISPSM 259 

+ R RLE ++ K+ VK R +A S E+RI S+ E +Q +R +V 

Sbjct: 191 PNLLERPRLE SHLQKLL TDAVKMRGLIAPASKETRIPKSIYEGIQTINRNLVCMLEL 247 

Query: 260 XXXXXXXXQSPK LNGSEIRLLDRHFXXXXXXXXXXAALINGRHARRIRIDTAINPEL 316 

+ LN ++R D AL G +N + 

Sbjct: 248 QINAYWATRPSHFVLLNAQKLR — DTQHMMQQI LLSLVHALYEGNPQPVFANTEKLNDAV 305 

Query: 317 EALAEHL — HYQWQ G FLWLSTNMRQE I S ALVI LLQRTRRK 354 

E L + L H+ + G++WL+ ++ L L+ R RK 

Sbjct: 306 EELRQLLNNHHDLKWETPIYGYVWLNMETAHQLELLSNLICRALRK 352 

On the basis of this analysis, including the identification of several transmembrane domains in the 
gonococcal protein, it is predicted that the proteins from N.meningitidis and ^gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 76 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 639> 

1 . . GCCGAAGACA CGCGCGTTAC CGCACAGCTT TTGAGCGCGT ACGGCATTCA 
51 GGGCAAACTC GTCAGTGTGC GCGAACACAA CGAACGGCAG ATGGCGGACA 
101 AGATTGTCGG CTATCTTTCA GACGGCATGG TTGTGGCACA GGTTTCCGAT 
151 GCGGGTACGC CGGCCGTGTG CGACCCGGGC GCGAAACTCG CCCGCCGCGT 
201 GCGTGAGGCC GGGTTTAAAG TCGTTCCCGT CGTGGGCGCA AC . GCGGTGA 
251 TGGCGGCTTT GAGCGTGGCC GGTGTGGAAG GATCCGATTT TTATTTCAAC 
301 GGTTTTGTAC CGCCGAAATC GGGAGAACGC AGGAAACTGT TTGCCAAATG 
351 GGTGCGGGCG GCGTTTCCTA TCGTCATGTT TGAAACGCCG CACCGCATCG 
401 GTGCAGCGCT TGCCGATATG GCGGAACTGT TCCCCGAACG CCGATTAATG 
451 CTGGCGCGCG AAATTACGAA AACGTTTGAA ACGTTCTTAA GCGGCACGGT 
501 TGGGGAAATT CAGACGGCAT TGTCTGCCGA CGGCGACCAA TCGCGCGGCG 
551 AGATGGTGTT GGTGCTTTAT CCGGCGCAGG ATGAAAAACA CGAAGGCTTG 
601 TCCGAGTCCG CGCAAAACAT CATGAAAATC CTCACAGCCG AGCTGCCGAC 
651 CAAACAGGCG GCGGAGCTTG CTGCCAAAAT CACGGGCGAG GGAAAGAAAG 
701 CTTTGTACGA T . . 

This corresponds to the amino acid sequence <SEQ ID 640; ORF147>: 

1 . .AEDTRVTAQL LSAYGIQGKL VSVREHNERQ MADKIVGYLS DGMWAQVSD 
51 AGTPAVCDPG AKLARRVREA GFKWPWGA XAVMAALSVA GVEGSDFYFN 
101 GFVPPKSGER RKLFAKWVRA AFPIVMFETP HRIGAALADM AELFPERRLM 
151 LAREITKTFE TFLSGTVGEI QTALSADGDQ SRGEMVLVLY PAQDEKHEGL 
201 SESAQNIMKI LTAELPTKQA AELAAKITGE GKKALYD . . 

Further work revealed the complete nucleotide sequence <SEQ ID 641>: 

1 ATGTTTCAGA AACATTTGCA GAAAGCCTCC GACAGCGTCG TCGGAGGGAC 

51 ATTATACGTG GTTGCCACGC CCATCGGCAA TTTGGCGGAC ATTACCCTGC 

101 GCGCTTTGGC GGTATTGCAA AAGGCGGACA TCATCTGTGC CGAAGACACG 

151 CGCGTTACCG CACAGCTTTT GAGCGCGTAC GGCATTCAGG GCAAACTCGT 
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201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



CAGTGTGCGC 
ATCTTTCAGA 
GCCGTGTGCG 
GTTTAAAGTC 
GCGTGGCCGG 
CCGAAATCGG 
GTTTCCTATC 
CCGATATGGC 
ATTACGAAAA 
GACGGCATTG 
TGCTTTATCC 
CAAAACATCA 
GGAGCTTGCT 
TGGCTCTGTC 



GAACACAACG 
CGGCATGGTT 
ACCCGGGCGC 
GTTCCCGTCG 
TGTGGAAGGA 
GAGAACGCAG 
GTCATGTTTG 
GGAACTGTTC 
CGTTTGAAAC 
TCTGCCGACG 
GGCGCAGGAT 
TGAAAATCCT 
GCCAAAATCA 
TTGGAAAAAC 



AACGGCAGAT 
GTGGCACAGG 
GAAACTCGCC 
TGGGCGCAAG 
TCCGATTTTT 
GAAACTGTTT 
AAACGCCGCA 
CCCGAACGCC 
GTTCTTAAGC 
GCAACCAATC 
GAAAAACACG 
CACAGCCGAG 
CGGGCGAGGG 
AAATAG ■ 



GGCGGACAAG 
TTTCCGATGC 
CGCCGCGTGC 
CGCGGTGATG 
ATTTCAACGG 
GCCAAATGGG 
CCGCATCGGT 
GATTAATGCT 
GGCACGGTTG 
GCGCGGCGAG 
AAGGCTTGTC 
CTGCCGACCA 
AAAGAAAGCT 



ATTGTCGGCT 
GGGTACGCCG 
GTGAGGCCGG 
GCGGCTTTGA 
TTTTGTACCG 
TGCGGGCGGC 
GCGACGCTTG 
GGCGCGCGAA 
GGGAAATTCA 
ATGGTGTTGG 
CGAGTCCGCG 
AACAGGCGGC 
TTGTACGATC 



This corresponds to the amino acid sequence <SEQ ID 642; ORF147-l>: 

1 MFQKHLQKAS DSWGGTLYV VATPIGNLAD ITLRALAVLQ KADIICAEDT 

51 RVTAQLLSAY GIQGKLVSVR EHNERQMADK IVGYLSDGMV VAQVSDAGTP 

101 AVCDPGAKLA RRVREAGF KV VPWGASAVM AALSVA GVEG SDFYFNGFVP 

151 PKSGERRKLF AKWVRAAFPI VMFETPHRIG ATLADMAELF PERRLMLARE 

201 ITKTFETFLS GTVGEIQTAL SADGNQSRGE MVLVLYPAQD EKHEGLSESA 

251 QNIMKILTAE LPTKQAAELA AKITGEGKKA LYDLALSWKN K* 

Computer analysis of this amino acid sequence gave the following results: , 

Homology with hypothetical protein ORF286 of E.coli (accession number U18997^ 

ORF147 and E.coli ORF286 protein show 36% aa identity in 237aa overlap: j 

0rfl47: 1 AE DTRVT AQLLSAYGI QGKLV S VREHNERQMADKI VGYL S DGMWAQVS DAGT PAVC DPG 60 j 
AEDTR T LL +GI +L ++ +HNE+Q A+ ++ L +G +A VSDAGTP + DPG ' 
Or f 28 6: 43 AEDTRHTGLLLQHFGINARLFALHDHNEQQKAETLLAKLQEGQNI7VLVSDAGTPLINDPG 102 

0rfl47: 61 AKIARRVREXXXXXXXXXXXXXXXXXXXXXXXEGSDFYFNGFVPPKSGERRKLFAKWVRA 120 

L R RE F + GF+P KS RR 

Orf286: 103 YHLVRTCREAGIRWPLPGPCAAITALSAAGLPSDRFCYEGFLPAKSKGRRDALKAIEAE 162 

Orfl47: 121 AFPIVMFETPHRIGAALADMAELFPERR-LMLAREITKTFETFLSGTVGEIQTALSADGD 179 

++ +E+ HR+ +L D+ + E R ++LARE+TKT+ET VGE+ + D + 

Orf286: 163 PRTLIFYESTHRLLDSLEDIVAVLGESRYWLARELTKTWETIHGAPVGELLAWVKEDEN 222 

Orfl47: 180 QSRGEMVLVLYPAQDEKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALY 236 

+ +GEMVL++ + E L A + +L AELP K+AA LAA+I G K ALY 

Orf286: 223 RRKGEIWLIV-EGHKAQEEDLPADALRTLALLQAELPLKKAAALAAEIHGVKKNALY 278 

Homology with a predicted ORF from N.meninzitidis (strain A) 

ORF147 shows 96.6% identity over a 237aa overlap with ORF75a from strain A of AT. meningitidis: 

10 20 30 

orfl47.pep AE DTRVT AQLL SAYGI QGKLVS VREHNERQ 

MM IIMMMIIIIII III IMII till 
orf75a T LYWAT PI GNLAD ITLRALAVLQKAD 1 1 CAE DTRVTAQLLSAYGI QGKLV SVREHNERQ 

20 30 40 50 60 70 

40 50 60 70 80 90 

orf 147 .pep MADKI VGYLS DGMWAQVS DAGT PAVC DPGAKLARRVREAGF KWP WGAXAVMAALS VA 
IN II I MM! II [Mill I Mil Mil III Mill 111:111 Mill M I I IN MM 
orf 75a MADKI VGYLS DGMWAQVS DAGT PAVCD PGAKLARRVREVG F KWP WGASAVMAALS VA 

80 90 100 110 120 130 

100 110 120 130 140 150 

orf 147. pep GVEGSDFYFNGFVPPKSGERRKLFAKWVRAAFPIVMFETPHRIGAALADMAELFPERRLM 
I I lllllllllllllllllllllll|||:|||:|||||||||||:|i|||||||||||| 
orf 75a GVAGSDFYFNGFVPPKSGERRKLFAKWVRVAFPWMFETPHRIGATLADMAELFPERRLM 
140 150 160 170 180 190 

160 170 180 190 200 210 

orf 14 7 . pep LAREITKTFETFLSGTVGEIQTALSADGDQSRGEMVLVLYPAQDEKHEGLSESAQNIMKI 
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IMII II Mil I II II Ml IMM:I Ihiil III I III M Mill I II ! II II MINI 
orf75a LARE ITKTFETFLSGTVGE IQTALAADGNQSRGEMVLVLY PAQDEKHEGLSESAQNIMKI 

200 210 220 230 240 250 

220 230 
orf 147 . pep LTAELPTKQAAELAAKITGEGKKALYD 
I M I I I M M M M II M II I M M M 
or f 7 5a LTAELPTKQAAELAAKITGEGKKALYDLALSWKNKX 
260 270 280 290 

ORF147a is identical to ORF75a, which includes aa 56-292 of ORF75. 
Homology with a predicted ORF from N. gonorrhoeae 

ORF147 shows 94,1% identity over a 237aa overlap with a predicted ORF (ORF147ng) from N. 
gonorrhoeae: 



orf 147. pep 
orfl47ng 
orf 147. pep 
orfl47ng 
orf 14 7. pep 
orfl47ng 
orf 14 7. pep 
orfl47ng 
orf 14 7. pep 
orfl47ng 



AEDTRVTAQLLSAYGIQGKLVSVREHNERQ 
MM MM II I Mill 11:111 II I III II 
TLYWAT P IGNLAD ITLRALAVLQKAD I ICAEDTRVTAQLLS AYGIQGRLVSVREHNERQ 



30 



85 



90 



MADKI VGYLS DGMWAQVS DAGT PAVCDPGAKLARRVRE AGFKWPWGAXAVMAALS VA 
II M : M M M M M I M II II I II II II II M II II M II II II II I II IIIIIIMI 
MADKVIGFLSDGLWAQVSDAGTPAVCDPGAKLARRVREAGFKWPWGASAVMAALSVA 145 

GVEGSDFYFNGFVPPKSGERRKLFAKWRAAFPIVMFETPHRIGAALADMAELFPERRLM 150 
II MIMIMlMIMIIIMIllMMIII:IIillllllll:Mllllllilllll 
GVAESDFYFNGFVPPKSGERRKLFAKWVRAAFPWMFETPHRIGATLADMAELFPERRLM 205 

LAREITKTFETFLSGTVGEIQTALSADGDQSRGEMVLVLYPAQDEKHEGLSESAQNIMKI 210 
MMMIMIMMIMMIIIII:MI:IIMMIIIM1IMIIMIMMI1I Ml 
LAREITKTFETFLSGTVGEIQTAIAADGNQSRGEMVLVLYPAQDEKHEGLSESAQNAMKI 265 

LTAELPTKQAAELAAKITGEGKKALYD 237 
MMM III M MM II Mill Ml II 
LAAELPTKQAAELAAKITGEGKKALYDLALSWKNK 300 



An ORF147ng nucleotide sequence <SEQ ID 643> was predicted to encode a protein having amino 
acid sequence <SEQ ID 644>: 



1 MSVFQTAFFM 

51 ADIICAEDTR 

101 AQVSDAGTPA 

151 DFYFNGFVPP 

201 ERRLMLAREI 

251 KHEGLSESAQ 

301 * 



FQKHLQKASD 
VTAQLLSAYG 
VCDPGAKLAR 
KSGERRKLFA 
TKTFETFLSG 
NAMKILAAEL 



SWGGTLYW 
IQGRLVSVRE 
RVREAGFKW 
KWVRAAFPW 
TVGEIQTALA 
PTKQAAELAA 



ATP IGNLAD I 
HNERQMADKV 
PWGASAVMA 



TLRALAVLQK 
IGFLSDGLW 
ALSVAGVAES 



MFETPHRIGA 
ADGNQSRGEM 
KITGEGKKAL 



TLADMAELFP 
VLVLYPAQDE 
YDLALSWKNK 



Further work revealed the following gonococcal DNA sequence <SEQ ID 645>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



ATGTTTCAGA 
ATTATACGTG 
GCGCTTTGGC 
CGCGTTACTG 
CAGTGTGCGC 
TCCTTTCAGA 
GCCGTGTGCG 
GTTCAAAGTC 
GTGTGGCCGG 
CCGAAATCGG 
ATTTCCTGTC 
CCGATATGGC 
ATCACGAAAA 
GACGGCATTG 
TGCTTTATCC 
CAAAATGCGA 
GGAGCTTGCC 
TGGCACTGTC 



AACACTTGCA 
GTTGCCACGC 
GGTATTGCAA 
CGCAGCTTTT 
GAACACAACG 
CGGCCTGGTT 
ACCCGGGCGC 
GTTCCCGTCG 
TGTGGCGGAA 
GCGAACGTAG 
GTCATGTTTG 
GGAATTGTTC 
CGTTTGAAAC 
GCGGCGGACG 
GGCGCAGGAT 
TGAAAATCCT 
GCCAAGATTA 
GTGGAAAAAC 



GAAAGCCTCC 
CCATCGGCAA 
AAGGCGGACA 
GAGCGCGTAC 
AGCGGCAGAT 
GTGGCGCAGG 
GAAACTCGCC 
TGGGCGCAAG 
TCCGATTTTT 
GAAATTGTTT 
AAACGCCGCA 
CCCGAACGCC 
GTTCTTAAGC 
GCAACCAATC 
GAAAAACACG 
TGCGGCCGAG 
CAGGTGAGGG 
AAATGA 



GACAGCGTCG 
TTTGGCAGAC 
TCATTTGTGC 
GGCATTCAGG 
GGCGGACAAG 
TTTCCGATGC 
CGCCGCGTGC 
CGCGGTAATG 
ATTTCAACGG 
GCCAAATGGG 
CCGAATCGGG 
GTCTGATGCT 
GGCACGGTTG 
GCGCGGCGAG 
AAGGCTTGTC 
CTGCCGACCA 
CAAAAAGGCT 



TCGGAGGGAC 
ATTACCCTGC 
CGAAGACACG 
GCAGGTTGGT 
GTAATCGGTT 
GGGTACGCCG 
GCGAAGCAGG 
GCGGCGTTGA 
TTTTGTACCG 
TGCGGGCGGC 
GCAACGCTTG 
GGCGCGCGAA 
GGGAAATTCA 
ATGGTGTTGG 
CGAGTCTGCG 
AGCAGGCGGC 
TTGTACGATT 
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This corresponds to the amino acid sequence <SEQ ID 646; ORF147ng-l>: 

1 MFQKHLQKAS DSWGGTLYV VATPIGNLAD ITLRALAVLQ KADI I CAE DT 

51 RVTAQLLSAY GIQGRLVSVR EHNERQMADK VIGFLSDGLV VAQVSDAGTP 

101 AVCDPGAKLA RRVREAGFKV VPWGASAVM AALSVAGVAE SDFYFNGFVP 

5 151 PKSGERRKLF AKWVRAAFPV VMFETPHRIG ATLADMAELF PERRLMLARE 

201 ITKTFETFLS GTVGEIQTAL AADGNQSRGE MVLVLYPAQD EKHEGLSESA 

251 QNAMKILAAE LPTKQAAELA AKITGEGKKA LYDLALSWKN K* 

ORF147ng shows homology to a hypothetical E.coli protein: 

sp|P45528|YRAL_EC0LI HYPOTHETICAL 31.3 KD PROTEIN IN AGAI-MTR INTERGENIC REGION 
10 (F286) 

>gi 1 606086 (U18997) 0RF_f286 [Escherichia coli] 

>gi|1789535 (AE000395) hypothetical 31.3 kD protein in agai-mtr intergenic region 
[Escherichia coli] Length = 286 
Score = 218 bits (550), Expect = 3e-56 
15 Identities = 128/284 (45%), Positives - 171/284 (60%), Gaps - 4/284 (1%) 

KHLQKASDSWGGTLYWATPIGNLADITLRALAVLQKADI I CAE DTRVTAQL LS AYG I Q 63 
K Q A +S G LY+V TPIGNLADIT RAL VLQ D+I AEDTR T LL +GI 
KQHQSADNSQ GQLYIVPTPIGNLADITQRALEVLQAVDLIAAEDTRHTGLLLQHFGIN 59 

GRLVSVREHNERQMADKVIGFLSIX3LWAQVSDAGTPAVCDPGAKLARRVREAGFKWPV 123 

RL ++ +HNE+Q A+ ++ L +G +A VSDAGTP + DPG L R REAG +WP+ 
ARLFALHDHNEQQKAET LLAKLQEGQNI ALVS DAGT PLI N DPG YH LVRTCREAGI RVVPL 119 

VGASAVMAALSVAGVAESDFYFTJGFVPPKSGERRKLFAKWVRAAFPVVMFETPHRIGATL 183 
G A + ALS AG+ F + GF+P KS RR ++ +E + HR+ +L 

PGPCAAITALSAAGLPSDRFCYEGFLPAKSKGRRDALKAIEAEPRTLIFYESTHRLLDSL 17 9 j 

ADMAELFPERR-IiMLAREITKTFETFLSGTVGEIQTALAADGNQSRGEMVLVLYPAQDEK 242 I 
D+ + E R ++LARE+TKT+ET VGE+ + D N+ +GEMVL++ + 



EL A + +L AELP K+AA LAA+I G K ALY AL 
EEDLPADALRTLALLQAELPLKKAAALAAEIHGVKKNALYKYAL 282 

Based on the computer analysis and the presence of a putative transmembrane domain in the 
gonococcal protein, it is predicted that these proteins from ^meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 77 

40 The following partial DNA sequence was identified in N.meningitidis <SEQ ID 647> 

1 ATGAAAACAA CCGACAAACG GACAACCGAA ACACACCGCA AAGCCCCGAA 

51 AACCGGTCGC ATCCGCTTCT C.GCTGCTTA CTTAGCCATA TGCCTGTCGT 

101 TCGGCATTCT TCCCCAAGCC TGGGCGGGAC ACACTTATTT CGGCATCAAC 

151 TACCAATACT ATCGCGACTT TGCCGAAAAT AAAGGCAAGT TTGCAGTCGG 

4!> 201 GGCGAAAGAT ATTGAGGTTT ACAACAAAAA AGGGGAGTTG GTCGGCAAAT 

251 CAATGACAAA AGCCCCGATG ATTGATTTTT CTGTGGTGTC GCGTAACGGC 

301 GTGGCGGcAT TGGTGGGCGt ATCAATATAT TGTGAGCGTG GCACATAACG 

351 GCGGCTATAA CAACGTTGAT TTTGGTGCGG AAGGAAk.AA t ATCCC . GAT 

- A 401 CAACAwCGww TTACTTATAA AATTGTGAAA CGGAATAATT ATAAAGCAGG 

*>V 451 GACTAAAGGC CATCCTTATG GCGGCGATTA TCATATGCCG CGTTTGCATA 

501 AATwTGTCAC AGATGCAGAA CCTGTTGAAA TGACCAGTTA TATGGATGGG 

551 CGGAAATATA TCGATCAAAA TAATTACCCT GACCGTGTTC GTATTGGGGC 

601 AGGCAGGCAA TATTGGCGAT CTGATGAAGA TGAGCCCAAT AACCGCGAAA 

651 GTTCATATCA TATTGCAAGT " 

" 7 °1 GGCTC ACCAATGTTT ATCTATGATG CCCAAAAGCA 

751 AAAGTGGTTA ATTAATGGGG TATTGCAAAC GGGCAACCCC TATATAGGAA 

801 AAAGCAATGG CTTCCAGCTG GTTCGTAAAG ATTGGTTCTA TGATGAAATC 

851 TTTGCTGGAG ATACCCATTC AGTATTCTAC GAACCACGTC AAAATGGGAA 

901 ATACTCTTTT AACGACGATA ATAATGGCAC AGGAAAAATC AATGCCAAAC 





Query: 


4 


20 


Sbjct: 


2 






Query: 


64 




Sbjct: 


60 


25 


Query: 


124 




Sbjct: 


120 


30 


Query: 


184 






Sbjct: 


180 




Query: 


243 


35 


Sbjct: 


239 
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951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 

2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 
2701 
2751 
2801 
2851 
2901 
2951 
3001 

3551 
3601 
3651 
3701 
3751 
3801 
3851 
3901 
3951 
4001 
4051 
4101 
4151 
4201 
4251 
4301 
4351 



ATGAACACAA 
TTTAATGTTT 
AGGTGGTGTC 
CCTTTATTGA 
CAAGGTGCTG 
AAATAACGAA 
CCGTTACTTG 
GGCAAAGGCA 



TTCTCTGCCT 
CTTTATCCGA 
AACAGTTATC 
CGAAGGAAAA 
GAGGATTATA 
ACTTGGCAAG 
GAAAGTAAAC 
CGCTG 



AATAGATTAA 
GACAGCAAGA 
GACCCAGACT 
GGCGAATTGA 
TTTCCAAGGA 
GCGCGGGCGT 
GGCGTGGCAA 



AAACACGAAC 
GAACCTGTTT 
GAATAATGGA 
TACTTACCAG 
GATTTTACGG 
TCATATCAGT 
ACGACCGCCT 



CGTTCAATTG 
ATCATGCTGC 
GAAAATATTT 
CAACATCAAT 
TCTCGCCTGA 
GAAGACAGTA 
GTCCAAAATC 



// 



TGACTGCTTC 
GATCACGCTC 
TAGTGCAAAT 
ACGGCAACCk 
ACATTAAACG 
CGACCACGCC 
CAAACGTAAG 
GCAGTATTCC 
CAagGATACG 
GarCGGAATT 
TCCGCCTATC 
TGCGCCGCGC 
CACCGCCAAC 
AAATTGAACG 
CCGCAGCGAC 
TGGCGGTCAA 
GTAGTGGAAG 
CCTGCAAAAC 



ATTGACTAAG 
ATTTAAATCT 
GGCGATACAC 
TAgCCtCGtG 
GCAACACATC 
GTACAAAACG 
CCATTCCGCA 
ATTTTGAAAG 
GCATTACACT 
AGGCAATTTA 
GCCACGATGC 
CGCCGTTCGC 
TTCGGTAGAA 
GTCAGGGAAC 
AAATTGAAGC 
CAATACCGGC 
GAAAAGACAA 
GAACACGTCG 



CCGCAACGCC 
CGCAAGATTT 
ATGCAGAAAA 
CCGGACCGAA 
CCCACGGCGC 
ATCAGnCGCG 
GAGsmAAAwT 
CGCGCCGgtt 
ctATTTCGTC 
CCCCCGGCCT 
TCATTCAAAC 
CTATACCGAT 
TATTGGCTCA 
GCCGAAATCA 
CCCGCAACTG 
GGTAA. . . 



GTTTGGACAA 
CCGCGCCTAC 
ACCTCGGCAG 
AACACCTTCG 
CGTTTTCGGG 
GGCGCGGGTT 
CCGCCGCCGC 
tCggCGgATt 
CAAAAAGCGG 
TGCATTCAAC 
CGGCGCAACA 
GCCGCTTCGG 
GGATTTCGGC 
AAGGTTTCAC 
GAAGCGCAAC 



ACCGACATCA 
CACAGGGCTT 
GTTATACAGT 
G.sAATGcCC 
GGCTTCgGGC 
GCAGTCTGAC 
CTCAACGGTA 
CAGCCGCTTT 
TAAAAGACAG 
AACCTTGACA 
GGCAGGGGCG 
GCCGTTCGCG 
TCCCGTTTCA 
ATTCCGCTTT 
TGGCGGAAAG 
AACGAACCTG 
CAAACCGCTG 
ATGCAGGCGC 
// 

.... TTAGAC 
GCGGCATCCG 
CGCCAACAAA 
CGGGCGCGTC 
ACGACGGCAT 
CAATACGGCA 
TTAGCAGCGG 
GTGCtGCATT 
CGGCATCGAA 
ATTACCGCTA 
CGcTACCGCG 
CATTTCCATC 
GCAAAGTCCG 
AAAACCCGCA 
GCTGTCCCTC 
ACAGCGCGGG 



GCGGCAATGT 
GCCACACTCA 
CAGCCACAAC 
AAGCAACATT 
AATGCTTCAT 
GCTTTCCGGC 
ATGTCTCCCT 
ACCGGACAAA 
CGAATGGACG 
ACGCCACCAT 
CAAACCGGCA 
CCGTTCCCTA 
ACACGCTGAC 
ATGTCGGAAC 
TTCCGAAGGC 
CAAGCCTCGA 
TCCGAAAACC 
GTGG 



. . .GATAAAG 
CGATCTTGCC 
ACGGCAATCT 
GCCACCCAAA 
TAATCAAGCC 
TTAATCTAAG 
AACGCTAAGG 
AGCCGATAAG 
TCAGCGGCGG 
CTGCCGTCAg 
TACaCTCAAT 
GTGCGACAGA 
TTATmCGTTA 
GGTAAACGGC 
TCTTCGGCTA 
ACTTACACCT 
ACAATTGACG 
TTAATTTCAC 



CGCGTATTTG 
GGACACCAAA 
CCGACCTGCG 
GGCATCCTGT 
CGGCAACTCG 
TCGACAGGTT 
CAGCCTTTcA 
ACGGCATTCA 
CCGCACATCG 
CGAAAACGTC 
CGGGCATTAa 
ACGCCTTATT 
AACACGCGTC 
GTGCGGAATG 
CACGCTGCCG 
CATCAAATTA 



CCGAAGACCG 
CACTACCGTT 
CCAAATCGGT 
TTTCGCACAA 
GCACGGCTTG 
CTACATCGGC 
GACGGCATCG 
GGCACGAtAC 
GCGCAACGCg 
AATATCGCCA 
GGCAGATTAT 
TGAGCCTGTC 
AATACCGCCG 
GGgCGTAAAC 
CCGCCAAAGG 
GGCTACCGCT 



This corresponds to the amino acid sequence <SEQ ID 648; ORFl>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 

701 
751 
801 
851 
901 
951 
1001 

1151 
1201 
1251 
1301 
1351 



MKTTDKRTTE 
YQYYRDFAEN 
VAALVGVQYI 
TKGHPYGGDY 
GRQYWRSDED 
KWLINGVLQT 
YSFNDDNNGT 
GGVNSYRPRL 
NNETWQGAGV 



THRKAPKTGR 
KGKFAVGAKD 
VSVAHNGGYN 
HMPRLHKXVT 
EPNNRESSYH 
GNPYIGKSNG 
GKINAKHEHN 
NNGENISFID 
HISEDSTVTW 



SANGDTRYTV 
DHAVQNGSLT 
KDTALHLKDS 
APRRRSRRSR 
RSDKLKLAES 
LQNEHVDAGA 



DKVTAS 

SHNATQNGNX 
LSGNAKANVS 
EWTLPSGXEL 
RSLLXVTPPT 
SEGTYTLAVN 
W 



IRFXAAYLAI 
IEVYNKKGEL 
NVDFGAEGXN 
DAEPVEMTSY 
XAS ....... 

FQLVRKDWFY 
SLPNRLKTRT 
EGKGELILTS 
KVNGVANDRL 
II 

LTKTDISGNV 
SLVXNAQATF 
HSALNGNVSL 
GNLNLDNATI 
SVESRFNTLT 
NTGNEPASLE 



CLSFGILPQA 
VGKSMTKAPM 
IXDQXRXTYK 
MDGRKYIDQN 

GS 

DEIFAGDTHS 
VQLFNVSLSE 
NINQGAGGLY 
SKIGKGTL. . 



WAGHTYFGIN 
IDFSWSRNG 
IVKRNNYKAG 
NYPDRVRIGA 
PMFIYDAQKQ 
VFYEPRQNGK 
TAREPVYHAA 
FQGDFTVSPE 



DLADHAHLNL 
NQATLNGNTS 
ADKAVFHFES 
TLNSAYRHDA 
VNGKLNGQGT 
QLTWEGKDN 



TGLATLNGNL 
ASGNASFNLS 
SRFTGQISGG 
AGAQTGSATD 
FRFMSELFGY 
KPLSENLNFT 



// 



RNAVWTSGIR DTKHYRSQDF 
RTENTFDDGI GNSARLAHGA 
XKXRRRVLHY GIQARYRAGF 
PGLAFNRYRA GIKADYSFKP 



LDRVFAEDR 

RAYRQQTDLR QIGMQKNLGS GRVGILFSHN 
VFGQYGIDRF YIGISAGAGF SSGSLSDGIG 
GGFGIEPHIG ATRYFVQKAD YRYENVNIAT 
AQHISITPYL SLSYTDAASG KVRTRVNTAV 
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1401 
1451 



LAQDFGKTRS AEWGVNAEIK GFTLSLHAAA AKGPQLEAQH SAGIKLGYRW 



Further sequencing analysis revealed the complete nucleotide sequence <SEQ ID 649>: 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



65 



70 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 
2701 
2751 
2801 
2851 
2901 
2951 
3001 
3051 
3101 
3151 
3201 
3251 
3301 



ATGAAAACAA 
AACCGGCCGC 
TCGGCATTCT 
TACCAATACT 
GGCGAAAGAT 
CAATGACAAA 
GTGGCGGCAT 
CGGCTATAAC 
ATCGTTTTAC 
AAAGGCCATC 
TGTCACAGAT 
AATATATCGA 
AGGCAATATT 
ATATCATATT 
CACAAAATGG 
AAACATAGCC 
TGGCTCACCA 
ATGGGGTATT 
CAGCTGGTTC 
CCATTCAGTA 
ACGATAATAA 
CTGCCTAATA 
ATCCGAGACA 
GTTATCGACC 
GGAAAAGGCG 
ATTATATTTC 
GGCAAGGCGC 
GTAAACGGCG 
GCACGTTCAA 
GTACAGTCAT 
TTTAGTGAAA 
CGATAATCAG 
GTTTGGATTT 
GATGAAGGGG 
TACCATTACA 
TGGATAGCAA 
ACGACCAAAA 
AGACCGCACC 
CGCAAACAAA 
TACAATCATT 
GGAAATCGTG 
ACTTCCAAAT 
GTGAAAGGCG 
CGCACCGCAT 
TGACAAATTG 
TTGACTAAGA 
TTTAAATCTC 
GCGATACACG 
AGCCTCGTGG 
CAACACATCG 
TACAAAACGG 
CATTCCGCAC 
TTTTGAAAGC 
CATTACACTT 
GGCAATTTAA 
CCACGATGCG 
GCCGTTCGCG 
TCGGTAGAAT 
TCAGGGAACA 
AATTGAAGCT 
AATACCGGCA 
AAAAGACAAC 
AACACGTCGA 
GAGTTCCGCC 
CGGCAAGGCA 
TTGACGCGCT 
GTTGCCGAAC 



CCGACAAACG 
ATCCGCTTCT 
TCCCCAAGCC 
ATCGCGACTT 
ATTGAGGTTT 
AGCCCCGATG 
TGGTGGGCGA 
AACGTTGATT 
TTATAAAATT 
CTTATGGCGG 
GCAGAACCTG 
TCAAAATAAT 
GGCGATCTGA 
GCAAGTGCGT 
ATCAGGTGGT 
CATATGGTTT 
ATGTTTATCT 
GCAAACGGGC 
GTAAAGATTG 
TTCTACGAAC 
TGGCACAGGA 
GATTAAAAAC 
GCAAGAGAAC 
CAGACTGAAT 
AATTGATACT 
CAAGGAGATT 
GGGCGTTCAT 
TGGCAAACGA 
GCCAAAGGGG 
TTTGGATCAG 
TCGGCTTGGT 
TTCAACCCCG 
AAACGGGCAT 
CGATGATTGT 
GGCAATAAAG 
AAAAGAAATT 
CGAACGGGCG 
CTGCTGCTTT 
CGGCAAACTG 
TAAACGACCA 
TGGGACAACG 
TAAAGGCGGA 
ATTGGCATTT 
CAAAGCCACA 
TGTCGAAAAA 
CCGACATCAG 
ACAGGGCTTG 
TTATACAGTC 
GCAATGCCCA 
GCTTCGGGCA 
CAGTCTGACG 
TCAACGGTAA 
AGCCGCTTTA 
AAAAGACAGC 
ACCTTGACAA 
GCAGGGGCGC 
CCGTTCGCGC 
CCCGTTTCAA 
TTCCGCTTTA 
GGCGGAAAGT 
ACGAACCTGC 
AAACCGCTGT 
TGCCGGCGCG 
TGCATAATCC 
GAAGCCAAAA 
GATTGCGGCC 
CGGCCCGGCA 



GACAACCGAA 
CGCCTGCTTA 
TGGGCGGGAC 
TGCCGAAAAT 
ACAACAAAAA 
ATTGATTTTT 
TCAATATATT 
TTGGTGCGGA 
GTGAAACGGA 
CGATTATCAT 
TTGAAATGAC 
TACCCTGACC 
TGAAGATGAG 
ATTCTTGGCT 
GGCACAGTCA 
TTTACCAACA 
ATGATGCCCA 
AACCCCTATA 
GTTCTATGAT 
CACGTCAAAA 
AAAATCAATG 
ACGAACCGTT 
CTGTTTATCA 
AATGGAGAAA 
TACCAGCAAC 
TTACGGTCTC 
ATCAGTGAAG 
CCGCCTGTCC 
AAAACCAAGG 
CAGGCAGACG 
CAGCGGCAGG 
ACAAACTCTA 
TCGCTTTCGT 
CAACCACAAT 
ATATTGCTAC 
GCCTACAACG 
GCTCAACCTT 
CCGGCGGAAC 
TTTTTCAGCG 
TTGGTCGCAA 
ACTGGATCAA 
CAGGCGGTGG 
GAGCAATCAC 
CAATCTGTAC 
ACCATTACCG 
CGGCAATGTC 
CCACACTCAA 
AGCCACAACG 
AGCAACATTT 
ATGCTTCATT 
CTTTCCGGCA 
TGTCTCCCTA 
CCGGACAAAT 
GAATGGACGC 
CGCCACCATT 
AAACCGGCAG 
CGTTCCCTAT 
CACGCTGACG 
TGTCGGAACT 
TCCGAAGGCA 
AAGCCTCGAA 
CCGAAAACCT 
TGGCGTTACC 
GGTCAAAGAA 
AACAGGCGGA 
GGGCGCGATG 
GGCAGGCGGG 



ACACACCGCA 
CTTAGCCATA 
ACACTTATTT 
AAAGGCAAGT 
AGGGGAGTTG 
CTGTGGTGTC 
GTGAGCGTGG 
AGGAAGAAAT 
ATAATTATAA 
ATGCCGCGTT 
CAGTTATATG 
GTGTTCGTAT 
CCCAATAACC 
CGTTGGTGGC 
ACTTAGGTAG 
GGAGGCTCAT 
AAAGCAAAAG 
TAGGAAAAAG 
GAAATCTTTG 
TGGGAAATAC 
CCAAACATGA 
CAATTGTTTA 
TGCTGCAGGT 
ATATTTCCTT 
ATCAATCAAG 
GCCTGAAAAT 
ACAGTACCGT 
AAAATCGGCA 
CTCGATCAGC 
ATAAAGGCAA 
GGTACGGTGC 
TTTCGGCTTT 
TCCACCGTAT 
CAAGACAAAG 
AACCGGCAAT 
GTTGGTTTGG 
GTTTACCAGC 
AAATTTAAAC 
GCAGACCAAC 
AAAGAGGGCA 
CCGCACATTT 
TTTCCCGCAA 
GCCCAAGCAG 
ACGTTCGGAC 
ACGATAAAGT 
GATCTTGCCG 
CGGCAATCTT 
CCACCCAAAA 
AATCAAGCCA 
TAATCTAAGC 
ACGCTAAGGC 
GCCGATAAGG 
CAGCGGCGGC 
TGCCGTCAGG 
ACACTCAATT 
TGCGACAGAT 
TATCCGTTAC 
GTAAACGGCA 
CTTCGGCTAC 
CTTACACCTT 
CAATTGACGG 
TAATTTCACC 
AACTCATCCG 
CAAGAGCTTT 
AAAAGACAAC 
CCGTCGAAAA 
GAAAATGTCG 



AAGCCCCGAA 
TGCCTGTCGT 
CGGCATCAAC 
TTGCAGTCGG 
GTCGGCAAAT 
GCGTAACGGC 
CACATAACGG 
CCCGATCAAC 
AGCAGGGACT 
TGCATAAATT 
GATGGGCGGA 
TGGGGCAGGC 
GCGAAAGTTC 
AATACCTTTG 
TGAAAAAATT 
TTGGCGACAG 
TGGTTAATTA 
CAATGGCTTC 
CTGGAGATAC 
TCTTTTAACG 
ACACAATTCT 
ATGTTTCTTT 
GGTGTCAACA 
TATTGACGAA 
GTGCTGGAGG 
AACGAAACTT 
TACTTGGAAA 
AAGGCACGCT 
GTGGGCGACG 
AAAACAAGCC 
AACTGAATGC 
CGCGGCGGAC 
TCAAAATACC 
AATCCACCGT 
AACAACAGCT 
CGAGAAAGAT 
CCGCCGCAGA 
GGCAACATCA 
ACCGCACGCC 
TTCCTCGCGG 
AAAGCGGAAA 
TGTTGCCAAA 
TTTTTGGTGT 
TGGACGGGTC 
GATTGCTTCA 
ATCACGCTCA 
AGTGCAAATG 
CGGCAACCTT 
CATTAAACGG 
GACCACGCCG 
AAACGTAAGC 
CAGTATTCCA 
AAGGATACGG 
CACGGAATTA 
CCGCCTATCG 
GCGCCGCGCC 
ACCGCCAACT 
AATTGAACGG 
CGCAGCGACA 
GGCGGTCAAC 
TAGTGGAAGG 
CTGCAAAACG 
CAAAGACGGC 
CCGACAAACT 
GCGCAAAGCC 
GACAGAAAGC 
GCATTATGCA 
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3351 GGCGGAGGAA GAGAAAAAAC GGGTGCAGGC GGATAAAGAC ACCGCCTTGG 

3401 CGAAACAGCG CGAAGCGGAA ACCCGGCCGG CTACCACCGC CTTCCCCCGC 

3451 GCCCGCCGCG CCCGCCGGGA TTTGCCGCAA CTGCAACCCC AACCGCAGCC 

3501 CCAACCGCAG CGCGACCTGA TCAGCCGTTA TGCCAATAGC GGTTTGAGTG 

3551 AATTTTCCGC CACGCTCAAC AGCGTTTTCG CCGTACAGGA CGAATTAGAC 

3601 CGCGTATTTG CCGAAGACCG CCGCAACGCC GTTTGGACAA GCGGCATCCG 

3651 GGACACCAAA CACTACCGTT CGCAAGATTT CCGCGCCTAC CGCCAACAAA 

3701 CCGACCTGCG CCAAATCGGT ATGCAGAAAA ACCTCGGCAG CGGGCGCGTC 

3751 GGCATCCTGT TTTCGCACAA CCGGACCGAA AACACCTTCG ACGACGGCAT 

3801 CGGCAACTCG GCACGGCTTG CCCACGGCGC CGTTTTCGGG CAATACGGCA 

3851 TCGACAGGTT CTACATCGGC ATCAGCGCGG GCGCGGGTTT TAGCAGCGGC 

3901 AGCCTTTCAG ACGGCATCGG AGGCAAAATC CGCCGCCGCG TGCTGCATTA 

3951 CGGCATTCAG GCACGATACC GCGCCGGTTT CGGCGGATTC GGCATCGAAC 

4001 CGCACATCGG CGCAACGCGC TATTTCGTCC AAAAAGCGGA TTACCGCTAC 

4051 GAAAACGTCA ATATCGCCAC CCCCGGCCTT GCATTCAACC GCTACCGCGC 

4101 GGGCATTAAG GCAGATTATT CATTCAAACC GGCGCAACAC ATTTCCATCA 

4151 CGCCTTATTT GAGCCTGTCC TATACCGATG CCGCTTCGGG CAAAGTCCGA 

4201 ACACGCGTCA ATACCGCCGT ATTGGCTCAG GATTTCGGCA AAACCCGCAG 

4251 TGCGGAATGG GGCGTAAACG CCGAAATCAA AGGTTTCACG CTGTCCCTCC 

4301 ACGCTGCCGC CGCCAAAGGC CCGCAACTGG AAGCGCAACA CAGCGCGGGC 

4351 ATCAAATTAG GCTACCGCTG GTAA 

This corresponds to the amino acid sequence <SEQ ID 650; ORFl-l>: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 



MKTTDKRTTE 

YQYYRDFAEN 

VAALVGDQYI 

KGHPYGGDYH 

RQYWRSDEDE 

KHSPYGFLPT 

QLVRKDWFYD 

LPNRLKTRTV 

GKGELILTSN 

VNGVANDRLS 

FSEIGLVSGR 

DEGAMIVNHN 

TTKTNGRLNL 

YNHLNDHWSQ 

VKGDWHLSNH 

LTKTDISGNV 

SLVGNAQATF 

HSALNGNVSL 

GNLNLDNATI 

SVESRFNTLT 

NTGNEPASLE 

EFRLHNPVKE 

VAEPARQAGG 

ARRARRDLPQ 

RVFAEDRRNA 

GILFSHNRTE 

SLSDGIGGKI 

ENVNIATPGL 

TRVNTAVLAQ 

IKLGYRW* 



THRKAPKTGR 
KGKFAVGAKD 
VSVAHNGGYN 
MPRLHKFVTD 
PNNRESSYHI 
GGSFGDSGSP 
EIFAGDTHSV 
QLFNVSLSET 
INQGAGGLYF 
KIGKGTLHVQ 
GTVQLNADNQ 
QDKESTVTIT 
VYQPAAEDRT 
KEGIPRGEIV 
AQAVFGVAPH 
DLADHAHLNL 
NQATLNGNTS 
ADKAVFHFES 
TLNSAYRHDA 
VNGKLNGQGT 
QLTWEGKDN 
QELSDKLGKA 
ENVGIMQAEE 
LQPQPQPQPQ 
VWTSGIRDTK 
NTFDDGIGNS 
RRRVLHYGIQ 
AFNRYRAGIK 
DFGKTRSAEW 



IRFSPAYLAI CLSFGILPQA 



IEVYNKKGEL 
NVDFGAEGRN 
AEPVEMTSYM 
ASAYSWLVGG 
MFIYDAQKQK 
FYEPRQNGKY 
AREPVYHAAG 
QGDFTVSPEN 
AKGENQGSIS 
FNPDKLYFGF 
GNKDIATTGN 
LLLSGGTNLN 
WDNDWINRTF 
QSHTICTRSD 
TGLATLNGNL 
ASGNASFNLS 
SRFTGQISGG 
AGAQTGSATD 
FRFMSELFGY 
KPLSENLNFT 
EAKKQAEKDN 
EKKRVQADKD 
RDLISRYANS 
HYRSQDFRAY 
ARLAHGAVFG 
ARYRAGFGGF 
ADYSFKPAQH 
GVNAEIKGFT 



VGKSMTKAPM 
PDQHRFTYKI 
DGRKYIDQNN 
NTFAQNGSGG 
WLINGVLQTG 
SFNDDNNGTG 
GVNSYRPRLN 
NETWQGAGVH 
VGDGTVILDQ 
RGGRLDLNGH 
NNSLDSKKEI 
GNITQTNGKL 
KAENFQIKGG 
WTGLTNCVEK 
SANGDTRYTV 
DHAVQNGSLT 
KDTALHLKDS 
APRRRSRRSR 
RSDKLKLAES 
LQNEHVDAGA 
AQSLDALIAA 
TALAKQREAE 
GLSEFSATLN 
RQQTDLRQIG 
QYGIDRFYIG 
GIEPHIGATR 
ISITPYLSLS 
LSLHAAAAKG 



WAGHTYFGIN 
IDFSWSRNG 
VKRNNYKAGT 
YPDRVRIGAG 
GTVNLGSEKI 
NPYIGKSNGF 
KINAKHEHNS 
NGENISFIDE 
ISEDSTVTWK 
QADDKGKKQA 
SLSFHRIQNT 
AYNGWFGEKD 
FFSGRPTPHA 
QAWSRNVAK 
TITDDKVIAS 
SHNATQNGNL 
LSGNAKANVS 
EWTLPSGTEL 
RSLLSVTPPT 
SEGTYTLAVN 
WRYQLIRKDG 
GRDAVEKTES 
TRPATTAFPR 
SVFAVQDELD 
MQKNLGSGRV 
ISAGAGFSSG 
YFVQKADYRY 
YTDAASGKVR 
PQLEAQHSAG 



Computer analysis of these sequences gave the following results: 
Homology with a predicted ORF from ^meningitidis (strain A) 

ORF1 shows 57.8% identity over a 1456aa overlap with an ORF (ORFla) from strain A ofN. 



meningitidis: 

10 20 30 40 50 60 

orf 1 . pep MKTTDKRTTETHRKAPKTG RIRFXAAYLAICLSFGIL PQAWAGHTYFGINYQYYRDFAEN 
I | M I I I I I I I I I I I I I I I I I I I i II I I I I I I I I I I I I I 1 I t I I I I I I I I 1 I I I I II I 
orf la MKTTDKRTTETHRKAPKTG RIRFSPAYLAICLSFGIL PQAWAGHTYFGINYQYYRDFAEN 
10 20 30 40 50 60 



orf 1. pep 



70 80 90 100 110 120 

KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALVGVQYIVSVAHNGGYN 
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II II! tit! II III MINI III Mill MINIMUM MM!) II MINI Mill 
orfla KGKFAVGAKDIEVYNBCKGELVGKSMTKAPMIDFSWSRNGVAALVGDQYIVSVAHNGGYN 

70 80 90 100 110 120 



10 



15 



orf l.pep 
orfla 

orf l.pep 
orfla 



130 140 150 160 170 180 

NVDFGAEGXNIXDQXRXTYKIVKRNNYKAGTKGHPYGGDYHMPRLHKXVTDAEPVEMTSY 
II II Mil II II I :|:M!IIII! :: M i : M I II III I I Mill! Ml I 
NVDFGAEGXN-PDQHRFSYQIVKRNNYKPDNS-HPYNGDXHMPRLHKFVTDAEPVEMTSD 

130 140 150 160 170 

190 200 210 

MDGRKYI DQNNYPDRVRIGAGRQYWRS DEDE P NN 

! I I I :: M I : II M I M : : 1 II |:|: || 
MRGNTYSDKEKYPERVRIGSGHHYWRYDDDJCHGDLSYSGAWLIGGNTHMQGWGNNGVXSL 
180 190 200 210 220 230 



20 



25 



30 



35 



40 



220 230 240 250 260 

orf 1 . pep RESSYH IA SGSPMFIYDAQKQKWLINGVLQTGNPYIGKSNGFQLVRK 

I : : : : I I I I I I I I I M : : I I I : I I I I I M I I I : I I I I I : I I 

orfla SGDVRHANDYGPMPIAGAAGDSGSPMFIYDKTNNKWLLNGVLQTGYPYSGRENGFQLIRK 
240 250 260 270 280 290 

270 280 290 300 310 320 

orf 1 . pep DWFYDEIFAGDTHSVFYEPRQNGKYSFNDDNNGTGKINAKHEHNSLPNRLKTRTVQLFNV 

Mill:!: !MI:I : I I I : I I : : I I : : : 1 1 II I :: : I : I I : I I : : I I : M : 
orfla DWFYDDIYRGDTHTVXFEPRSNGHFSFTSNNNGTGTVTETNEKVSNP-KLKVQTVRLFDE 
300 310 320 330 340 350 

330 340 350 360 370 380 

orf 1 . pep SLSETAREPVYHAAGGVNSYRPRLNNGENISFI DEGKGELILTSNINQGAGGLYFQGDFT 

MM! MM! I II II I : I I II I I I I I I : M 1 I I : i : I I I : : I II I II I I II I : I I M 
orfla SLNETDKEPVY-AAGGVNQYRPRLNNGENLSFIDYGNGKLILSNNINQGAGGLYFEGDFT 

360 370 380 390 400 410 

390 400 410 420 430 

orf 1 . pep VSPENNETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGTL 

II I I II II I I I! I I II M I II M I I I M I II I I I I I I I I II I 
orfla VSPENNETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGTLHVQAKGENQGSISVGDGT 
420 430 440 450 460 470 



45 



orf l.pep 
orfla 



VILDQQADDKGKKQAFSEIGLXSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNGHSLSFH 
480 490 500 510 520 530 



50 



orf l.pep 
orfla 



RIQNTDEGAMIXXHNATTTSTVTITGNESITQPSGKNINRLNYSKEIAYNGWFGEKDTTK 
540 550 560 570 580 590 



55 



orf l.pep 
orfla 



TNGRLNLVYQPAAEDRTXLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNHLGSGWSKMEG 
600 610 620 630 640 650 



60 



65 



70 



orf 1 .pep 
orfla 

orf l.pep 
orfla 

orf l.pep 



IPQGEIVWDNDWIXRTFKAENFHIQGGQAVISRNVAKVEGDXHLSNHAQAVFGVAPHQSH 
660 670 680 690 700 710 

440 450 460 470 480 

XXXXXDKVTASLTKTDISGNVDLADHAHLNLTGLATLNGNLSAN 

: 11:11111111111111 Ml Ml I MM! 

TICTRSDWTGLTNC^XXITDDICVIASLTKTDXSGXVXLXXXXXXXLXGXAXLXGNLSAN 
720 730 740 750 760 770 

490 500 510 520 530 540 

GDTRYTVSHNATQNGNXSLVXNAQATFNQATLNGNTSASGNASFNLSDHAVQNGSLTLSG 
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10 



15 



20 



lltlilll llllltl! Ill lllillllilllll:! lllilll||::|:ll)IIM! 
orfla GDTRYTVS HNATQNGNLS LVGNAQATFNQATLNGNXSXSGNAS FNLSNNAAQNGS LTLS D 

780 790 800 810 820 830 

550 560 570 580 590 600 

orf 1 . pep KAKANVSHSALNGNVSLADKAVFHFESSRFTGQISGGKDTALHLKDSEWTLPSGXELGNL 
I I I I I I I 11 II I I I I I I I I I I I I I I I : I I I I I I : I I : I I I I II I I I I I I I I I I : I I I II 
orfla NAKANVSHSALNGNVSLADKAVFHFENSRFTGQLSGSKXTALHLKDSEWTLPSGTELGNL 
840 850 860 870 880 890 

610 620 630 640 650 660 

orf 1 . pep NLDNATITLNSAYRHDAAGAQTGSATDAPRRRSRRSRRSLLXVTPPTSVESRFNTLTVNG 
I I I I I I I M I I I I I II I I II I M ::|:llllllll II I I II I I I I I I I I I I I I II 

orfla NLDNAT ITLNSAYRHDAAGAQTGXVS DT PRRRSRRS LLSVTPPTSVESRFNTLTVNG 

900 910 920 930 940 950 

670 680 690 700 710 720 

orf 1 - pep KLNGQGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPASLEQLTWEGKDNKPL 
Ml I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 1 II I : II : I I I I I I I I I I I I I 
orfla KLNXQGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPVSLDQLTVVEGKDNKPL 
960 970 980 990 1000 1010 



25 



730 740 750 

or f 1 . pep SENLNFTLQNEHVDAGAW 

I I I I I I I I I I II I I I I I I 

orfla SENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKAEAKKQAEKDNAQS 
1020 1030 1040 1050 1060 1070 



30 orfl.pep 



orfla LDALIAAGRDAAEKTESVAEPARXAGGENVGIMQAEEEKKRVQADKDSALAKQREAETRP 
1080 1090 1100 1110 1120 1130 

35 760 

orfl.pep LDR 

III 

orfla XTTAFPRARXARRDLPQPQPQPQPQPQPQRDLXSRYANSGLSEFSATLNSVFAVQDELDR 
1140 1150 1160 1170 1180 1190 

40 

770 780 790 800 810 820 

or f 1 . pep VFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQTDLRQIGMQKNLGSGRVGILFSHNRTEN 
I I I I I I I I I I I I I II I 1 II I I II I II I I I I I I I I I I I I I II I I I I I II I I I I I I I I I I 
orfla VFAEDRRNAVWTSXIRXTKHYRSQDFRAYRQQTDLRQIGMQKNLGSGRVGILFSHNRTEN 
45 1200 1210 1220 1230 1240 1250 

830 840 850 860 870 880 

orf 1 . pep TFDDGIGNSARLAHGAVFGQYGIDRFYIGISAGAGFSSGSLSDGIGXKXRRRVLHYGIQA 
: I I I I I I I I I I I I I I I I I I II I I II 1111:1111111 llllll I llillllllll 
50 orfla XFDDGIGNSARLAHGAVFGQYGIGRFDIGI STGAGFS SGXLSDGIGGKIRRRVLHYGIQA 

1260 1270 1280 1290 1300 1310 

890 900 910 920 930 940 

orf 1 . pep RYRAGFGGFGIE PHI GATRYFVQKADYRYENVNI AT PGLAFNRYRAG IKADYS FKPAQHI 

55 * I I I I I I I I I I I I I : I I I I I I I I I I I I I I I M I I I 1 I II I 1 I I I I I I I I I I I I I I I M I I 

orfla RYRAGFGGFG IE P YI GATRY FVQKAD YRYEN VN I AT PG LAFNRYRAG I KAD Y S FKPAQHX 

1320 1330 1340 1350 1360 1370 

950 960 970 980 990 1000 

60 orf 1 . pep SITPYLSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEWGVNAEIKGFTLSLHAAAAKGP 

I I I I I I I I I I I II I I I I II I I M I II I I I I I I II I I I I I I I II I I I I I I I I I I I I I I I 
orfla SITPYXSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEWGVNAEIKGFTLSXHAAAAKGP 
1380 1390 1400 1410 1420 1430 

65 1010 1020 

orf 1 . pep QLEAQH SAG I KLG YRWX 

I I I II M M Ml II I II 
orfla QLEAQHSAGIKLGYRWX 
1440 1450 
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1 ATGAAAACAA CCGACAAACG GACAACCGAA ACACACCGCA AAGCCCCGAA 

51 AACCGGCCGC ATCCGCTTCT CGCCTGCTTA CTTAGCCATA TGCCTGTCGT 

101 TCGGCATTCT TCCCCAAGCT TGGGCGGGAC ACACTTATTT CGGCATCAAC 

151 TACCAATACT ATCGCGACTT TGCCGAAAAT AAAGGCAAGT TTGCAGTCGG 

201 GGCGAAAGAT ATTGAGGTNT ACAACAAAAA AGGGGAGTTG GTCGGCAAAT 

251 CAATGACAAA AGCCCCGATG ATTGATTTTT CTGTGGTGTC GCGTAACGGC 

301 GTGGCGGCAT TGGTGGGCGA TCAATATATT GTGAGCGTGG CACATAACGG 

351 CGGCTATAAC AACGTTGATT TTGGTGCGGA AGGAAGNAAT CCCGATCAGC 

401 ACCGTTTTTC TTACCAAATT GTGAAAAGAA ATAATTATAA GCCTGACAAT 

451 TCACACCCTT ACAACGGCGA TTANCATATG CCGCGTTTGC ATAAATTTGT 

501 CACAGATGCA GAACCTGTCG AAATGACGAG TGACATGAGG GGGAATACCT 

551 ATTCCGATAA AGAAAAATAT CCCGAGCGTG TCCGCATCGG CTCAGGACAC 

601 CACTATTGGC GTTATGATGA TGACAAACAC GGCGATTTAT CCTACTCCGG 

651 CGCATGGTTA ATTGGCGGCA ATACACATAT GCAGGGTTGG GGAAATAATG 

701 GCGTANTTAG TTTGAGCGGC GATGTGCGCC ATGCCAACGA CTATGGCCCT 

751 ATGCCGATTG CAGGTGCGGC AGGCGACAGC GGTTCGCCAA TGTTTATTTA 

801 TGACAAAACA AACAATAAAT GGCTGCTCAA CGGAGTTTTA CAAACCGGCT 

851 ACCCTTATTC CGGCAGGGAA AACGGTTTCC AGCTGATACG CAAAGATTGG 

901 TTCTACGATG ACATTTACAG AGGCGATACA CATACCGTCT NTTTTGAACC 

951 GCGCAGTAAC GGACATTTTT CCTTTACATC CAACAACAAC GGTACGGGTA 

1001 CGGTAACAGA AACCAACGAA AAGGTNTCCA ATCCAAAGCT TAAAGTACAG 

1051 ACAGTCCGAC TGTTTGACGA ATCTTTGAAT GAAACTGATA AAGAACCAGT 

1101 TTACGCGGCA GGGGGTGTTA ATCAGTACCG TCCAAGGTTA AACAACGGTG 

1151 AAAACCTTTC TTTTATCGAT TACGGCAACG GCAAACTCAT CTTATCAAAC 

1201 AACATCAACC AAGGCGCGGG CGGTTTGTAT TTTGAAGGTG ATTTTACGGT 

1251 CTCGCCTGAA AACAACGAAA CGTGGCAAGG CGCGGGCGTT CATATCAGTG 

1301 AAGACAGTAC CGTTACTTGG AAAGTAAACG GCGTGGCAAA CGACCGCCTG 

1351 TCCAAAATCG GCAAAGGCAC GCTGCACGTT CAAGCCAAAG GGGAAAACCA 

1401 AGGCTCGATC AGCGTGGGCG ACGGTACAGT CATTTTGGAT CAGCAGGCAG 

1451 ACGATAAAGG CAAAAAACAA GCCTTTAGTG AAATCGGCTT GNTCAGCGGC 

1501 AGGGGTACGG TGCAACTGAA TGCCGATAAT CAGTTCAACC CCGACAAACT 

1551 CTATTTCGGC TTTCGCGGCG GACGTTTGGA TTTAAACGGG CATTCGCTTT 

1601 CGTTCCACCG TATTCAAAAT ACCGATGAAG GGGCGATGAT TGNCNATCAT 

1651 AATGCCACAA CAACATCCAC CGTTACCATT ACAGGGAATG AAAGTATTAC 

1701 ACAACCGAGT GGTAAGAATA TCAATAGACT TAATTACAGC AAAGAAATTG 

1751 CCTACAACGG TTGGTTTGGC GAGAAAGATA CGACCAAAAC GAACGGGCGG 

1801 CTCAACCTTG TTTACCAGCC CGCCGCAGAA GACCGCACCC NGCTGCTTTC 

1851 CGGCGGAACA AATTTAAACG GCAACATCAC GCAAACAAAC GGCAAACTGT 

1901 TTTTCAGCGG CAGACCGACA CCGCACGCCT ACAATCATTT AGGAAGCGGG 

1951 TGGTCAAAAA TGGAAGGTAT CCCACAAGGA GAAATCGTGT GGGACAACGA 

2001 CTGGATCNAC CGCACGTTTA AAGCGGAAAA TTTCCATATT CAGGGCGGGC 

2051 AGGCGGTGAT TTCCCGCAAT GTTGCCAAAG TGGAAGGCGA TTGNCATTTG 

2101 AGCAATCACG CCCAAGCAGT TTTTGGTGTC GCACCGCATC AAAGCCATAC 

2151 AATCTGTACA CGTTCGGACT GGACNGGTCT GACAAATTGT GTCGAANAAA 

2201 NCATTACCGA CGATAAAGTG ATTGCTTCAT TGACTAAGAC NGACNTNAGC 

2251 GGCANTGTNA GNCTNNCCNA TNACGNTNNT TNAAANCTCN CNGGGCNTGC 

2301 NNCACTNAAN GGCAATCTTA GTGCAAATGG CGATACACGT TATACAGTCA 

2351 GCCACAACGC CACCCAAAAC GGCAACCTTA GCCTCGTGGG CAATGCCCAA 

2401 GCAACATTTA ATCAAGCCAC ATTAAACGGC AACNCATCGG NTTCGGGCAA 

2451 TGCTTCATTT AATCTAAGCA ACAACGCCGC ACAAAACGGC AGTCTGACGC 

2501 TTTCCGACAA CGCTAAGGCA AACGTAAGCC ATTCCGCACT CAACGGCAAT 

2551 GTCTCCCTAG CCGATAAGGC AGTATTCCAT TTTGAAAACA GCCGCTTTAC 

2601 CGGACAACTC AGCGGCAGCA AGGANACAGC ATTACACTTA AAAGACAGCG 

2651 AATGGACGCT GCCGTCAGGC ACGGAATTAG GCAATTTAAA CCTTGACAAC 

2701 GCCACCATTA CACTCAATTC CGCCTATCGC CACGATGCTG CAGGCGCGCA 

2751 AACCGGCAGN GTGTCAGACA CGCCGCGCCG CCGTTCGCGC CGTTCCCTAT 

2801 TATCGGTTAC ACCGCCAACT TCGGTAGAAT CCCGTTTCAA CACGCTGACG 

2851 GTAAACGGCA AATTGAACNG TCAAGGAACA TTCCGCTTTA TGTCGGAACT 

2901 CTTCGGCTAC CGAAGCGACA AATTGAAGCT GGCGGAAAGT TCCGAAGGNA 

2951 CTTACACCTT GGCGGTCAAC AATACCGGCA ACGAACCCGT AAGCCTCGAT 

3001 CAATTGACGG TAGTGGAAGG GAAAGACAAC AAACCGCTGT CCGAAAACCT 

3051 TAATTTCACC CTGCAAAACG AACACGTCGA TGCCGGCGCG TGGCGTTACC 

3101 AACTCATCCG CAAAGACGGC GAGTTCCGCC TGCATAATCC GGTCAAAGAA 

3151 CAAGAGCTTT CCGACAAACT CGGCAAGGCA GAAGCCAAAA AACAGGCGGA 

3201 AAAAGACAAC GCGCAAAGCC TTGACGCGCT GATTGCGGCC GGGCGCGATG 

3251 CCGCCGAAAA GACAGAAAGC GTTGCCGAAC CGGCCCGGCN GGCAGGCGGG 

3301 GAAAATGTCG GCATTATGCA GGCGGAGGAA GAGAAAAAAC GGGTGCAGGC 

3351 GGATAAAGAC AGCGCNTTGG CGAAACAGCG CGAAGCGGAA ACCCGGCCGG 

3401 NTACCACCGC CTTCCCCCGC GCCCGCNGCG CCCGCCGGGA TTTGCCGCAA 

3451 CCGCAGCCCC AACCGCAACC TCAACCCCAA CCGCAGCGCG ACCTGATNAG 

3501 CCGTTATGCC AATAGCGGTT TGAGTGAATT TTCCGCCACG CTCAACAGCG 

3551 TTTTCGCCGT ACAGGACGAA TTGGACCGCG TGTTTGCCGA AGACCGCCGC 



WO 99/24578 



-365- 



PCT/IB98/01665 



3601 
3651 
3701 
3751 
3801 
3851 
3901 
3951 
4001 
4051 
4101 
4151 
4201 
4251 
4301 



AACGCNGTTT 
AGATTTCCGC 
AGAAAAACCT 
ACCGAAAACA 
CGGCGCCGTT 
GCACGGGCGC 
AAAATCCGCC 
CGGTTTCGGC 
TCGTCCAAAA 
GGTCTTGCGT 
CAAACCGGCG 
CCGATGCCGC 
GCTCAGGATT 
AATCAAAGGT 
AACTGGAAGC 



GGACAAGCNG 
GCCTACCGCC 
CGGCAGCGGG 
NCTTCGACGA 
TTCGGGCAAT 
GGGTTTTAGC 
GCCGCGTGCT 
GGATTCGGCA 
AGCGGATTAC 
TCAACCGNTA 
CAACACATNT 
TTCGGGCAAA 
TCGGCAAAAC 
TTCACGCTGT 
GCAACACAGC 



CATCCGGNAC 
AACAAACCGA 
CGCGTCGGCA 
CGGCATCGGC 
ACGGCATCGG 
AGCGGCANTC 
GCATTACGGC 
TCGAACCGTA 
CGCTACGAAA 
CCGNGCGGGC 
CCATCACNCC 
GTCCGAACAC 
CCGCAGTGCG 
CCNTCCACGC 
GCGGGCATCA 



ACCAAACACT 
CCTGCGCCAA 
TCCTGTTTTC 
AACTCGGCAC 
CAGGTTCGAC 
TNTCAGACGG 
ATTCAGGCAC 
CATCGGCGCA 
ACGTCAATAT 
ATTAAGGCAG 
TTATTTNAGC 
GCGTCAATAC 
GAATGGGGCG 
TGCCGCCGCC 
AATTAGGCTA 



ACCGTTCGCA 
ATCGGTATGC 
GCACAACCGG 
GGCTTGCCCA 
ATCGGCATCA 
CATCGGAGGC 
GATACCGCGC 
ACGCGCTATT 
CGCCACCCCC 
ATTATTCATT 
CTGTCCTATA 
CGCNGTATTG 
TAAACGCCGA 
AAAGGNCCGC 
CCGCTGGTAA 



This encodes a protein having amino acid sequence <SEQ ID 652>: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 

iooi 

1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 



MKTTDKRTTE 
YQYYRDFAEN 
VAALVGDQYI 
SHPYNGDXHM 
HYWRYDDDKH 
MPIAGAAGDS 
FYDDIYRGDT 
TVRLFDESLN 
NINQGAGGLY 
SKIGKGTLHV 
RGTVQLNADN 
NATTTSTVTI 
LNLVYQPAAE 
WSKMEGIPQG 
SNHAQAVFGV 
GXVXLXXXXX 
ATFNQATLNG 
VSLADKAVFH 
ATITLNSAYR 
VNGKLNXQGT 
QLTWEGKDN 
QELSDKLGKA 
ENVGIMQAEE 

PQPQPQPQPQ 
NAVWTSXIRX 
TENXFDDGIG 
KIRRRVLHYG 
GLAFNRYRAG 
AQDFGKTRSA 



THRKAPKTGR 
KGKFAVGAKD 
VSVAHNGGYN 
PRLHKFVTDA 
GDLSYSGAWL 
GS PMFIYDKT 
HTVXFEPRSN 
ETDKEPVYAA 
FEGDFTVSPE 
QAKGENQGSI 
QFNPDKLYFG 
TGNESITQPS 
DRTXLLSGGT 
EIVWDNDWIX 
APHQSHTICT 
XXLXGXAXLX 
NXSXSGNASF 
FENSRFTGQL 
HDAAGAQTGX 
FRFMSELFGY 
KPLSENLNFT 
EAKKQAEKDN 
EKKRVQADKD 
PQRDLXSRYA 
TKHYRSQDFR 
NSARLAHGAV 
IQARYRAGFG 
IKADYSFKPA 
EWGVNAEIKG 



IRFSPAYLAI CLSFGIL PQA 



IEVYNKKGEL 
NVDFGAEGXN 
EPVEMTSDMR 
IGGNTHMQGW 
NNKWLLNGVL 
GHFSFTSNNN 
GGVNQYRPRL 
NNETWQGAGV 
SVGDGTVILD 
FRGGRLDLNG 
GKNINRLNYS 
NLNGNITQTN 
RTFKAENFHI 
RSDWTGLTNC 
GNLSANGDTR 
NLSNNAAQNG 
SGSKXTALHL 
VSDTPRRRSR 
RSDKLKLAES 
LQNEHVDAGA 
AQSLDALIAA 
SALAKQREAE 
NSGLSEFSAT 
AYRQQTDLRQ 
FGQYGIGRFD 
GFGIEPYIGA 
QHXSITPYXS 
FTLSXHAAAA 



VGKSMTKAPM 
PDQHRFSYQI 
GNTYSDKEKY 
GNNGVXSLSG 
QTGYPYSGRE 
GTGTVTETNE 
NNGENLSFID 
HISEDSTVTW 
QQADDKGKKQ 
HSLSFHRIQN 
KEIAYNGWFG 
GKLFFSGRPT 
QGGQAVISRN 
VEXXITDDKV 
YTVSHNATQN 
SLTLSDNAKA 
KDSEWTLPSG 
RSLLSVTPPT 
SEGTYTLAVN 
WRYQLIRKDG 
GRDAAEKTES 
TRPXTTAFPR 
LNSVFAVQDE 
IGMQKNLGSG 
IGISTGAGFS 
TRYFVQKADY 
LSYTDAASGK 
KGPQLEAQHS 



WAGHTYFGIN 
IDFSWSRNG 
VKRNNYKPDN 
PERVRIGSGH 
DVRHANDYGP 
NGFQLIRKDW 
KVSNPKLKVQ 
YGNGKLILSN 
KVNGVANDRL 
AFSEIGLXSG 
TDEGAMIXXH 
EKDTTKTNGR 
PHAYNHLGSG 
VAKVEGDXHL 
IASLTKTDXS 
GNLSLVGNAQ 
NVSHSALNGN 
TELGNLNLDN 
SVESRFNTLT 
NTGNEPVSLD 
EFRLHNPVKE 
VAEPARXAGG 
ARXARRDLPQ 
LDRVFAEDRR 
RVGILFSHNR 
SGXLSDGIGG 
RYENVNIATP 
VRTRVNTAVL 
AGIKLGYRW* 



A transmembrane region is underlined. 



ORF1-1 shows 863% identity over a 1462aa overlap with ORFla: 

10 20 30 40 50 60 

orf la . pep MKTTDKRTTETHRKAPKTGRIRFS PAYLAICLS FGI LPQAWAGHT YFG INYQYYRDFAEN 

IIIMII lllllillllllil MIIIIIIM! IIMIIMIIIMMMMM1IMI M 
orf 1-1 MKTTDKRTTET HRKAPKTGRIRFS PAYLAICLS FGI LPQAWAGHTY FG INYQYYRD FAEN 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf la . pep KGKFAVGAKDIEVYNKKGELVGKSMTKAPMI DFSWSRNGVAALVGDQY I VSVAHNGG YN 

1 1 1 1 i I I 1 1 1 1 1 1 1 1 1 I i 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

orf 1-1 KGKFAVGAKDIEVYNKKGELVGKSMTKAPMI DFSWSRNGVAALVGDQY I VSVAHNGG YN 

70 80 90 100 110 120 

130 140 150 160 170 179 

orf la . pep NVDFGAEGXN PDQHRFSYQI VKRNNYKPDNS-HPYNGDXHMPRLHKFVT DAE PVEMT S DM 

llllllli 1111111:1:11111111 :: 111:1! I I I I I I I 1 I I I I I t i I I I I I 
orf 1-1 NVDFGAEGRNPDQHRFTYKIVKRNNYKAGTKGHPYGGDYHMPRLHKFVTDAEPVEMTSYM 

130 140 150 160 170 180 
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10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



65 



70 



orf la .pep 
orfl-1 

orf la. pep 
orfl-1 

orf la. pep 
orfl-1 

orf la. pep 
orfl-1 

orf la. pep 
orfl-1 

orf la. pep 
orfl-1 

orf la. pep 
orfl-1 

orf la. pep 
orfl-1 

orf la. pep 
orfl-1 

orf la. pep 
orfl-1 

orfla.pep 
orfl-1 

orfla.pep 
orfl-1 



180 190 200 210 220 230 

RGNTYSDKEKYPERVRIGSGHHYWRYDDDKHGDL — SYSGA WLIGGNTHMQGWGNN 

I I l:::ll:llll|:|::||| |:|: :: || | ||:|||| |: ::: 
DGRKYIDQNNYPDRVRIGAGRQYWRSDEDEPNNRESSYHIASAYSWLVGGNTFAQNGSGG 
190 200 210 220 230 240 

240 250 260 270 280 290 

GVXSLSGD-VRHANDYGPMPIAGAAGDSGSPMFIYDKTNNKWLLNGVLQTGYPYSGRENG 
I : : 1 : : : : : I : N : I : I : I I I I I I i I I I I : : I I I : I I | | ) | | || I : II 
GTVNLGSEKIKHS-PYGFLPTGGSFGDSGSPMFIYDAQKQKWLINGVLQTGNPYIGKSNG 
250 260 270 280 290 

300 310 320 330 340 350 

FQLIRKDWFYDDIYRGDTHTVXFEPRSNGHFSFTSN^GTGTVTETNEKVSNP-KLKVQT 
IM:lllllll:|: lllhl : I I I : I I : : I I : : : I I I I I :: :|: | | :|| :: | 
FQLVRKDWFYDEIFAGDTHSVFYEPRQNGKYSFNDDNNGTGKINAKHfeHNSLPNRLKTRT 
300 310 320 330 340 350 

360 .370 380 390 400 410 

VRLFDESLNETDKEPVY-AAGGVNQYRPRLNNGENLSFIDYGNGKLILSNNINQGAGGLY 
» s I ' ! 11:11 :llll HIMI: II I Mill I I: I III i : I : I t I : : t I I I I 1 1 I I I 
VQLFNVSLSETAREPVYHAAGGVNSYRPRLNNGEN IS FI DEGKGELI LTSNINQGAGGLY 
360 370 380 390 400 410 

420 430 440 450 460 470 

FEGDFTVSPENNETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGTLHVQAKGENQGSI 

l-MMIMMIIII t 1 I I I I I 1 I I I I 1 I IIIIMM IlllilllllM 

FQGDFTVSPENNETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGTLHVQAKGENQGSI 
420 430 440 450 460 470 

480 490 500 510 520 530 

SVGDGTVILDQQADDKGKKQAFSEIGLXSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNG 
M I 1 1 I 1 1 I 1 1 1 1 1 I i 1 1 1 1 1 I 1 1 1 1 1 III III III II II I III HI II Mill Mill 
SVGDGTVILDQQADDKGKKQAFSEIGLVSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNG 
480 490 500 510 520 530 

540 550 560 570 580 590 

HSLSFHRIQNTDEGAMIXXHNATTTSTVTITGNESITQPSGKNINRLNYSKEIAYNGWFG 

IMMMIIMMIMI IMM::|: :|:| | |: :||IMMMI 

HSLSFHRIQNTDEGAMIVNHNQDKESTVTITGNKDIAT-TGNN-NSLDSKKEIAYNGWFG 
540 550 560 570 580 590 

600 610 620 630 640 650 

EKDTTKTNGRLNLVYQPAAEDRTXLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNHLGSG 
M M I I M I I I I II I I I I | || M II M I II II I I II II II I I I I II I M II I I I II : : 
EKDTTKTNGRLNLVYQPAAEDRTLLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNHLNDH 
600 610 620 630 640 650 

660 670 680 690 700 710 

WSKMEGIPQGEIVWDNDWIXRTFKAENFHIQGGQAVISRNVAKVEGDXHLSNHAQAVFGV 
4 s IIMMIIIIIIMI I I M I I K I : K r | r 1 I i : M I I I I I : M MIMMIMM 
W SQKEGI PRGE I VWDNDW INRTFKAEN FQI KGGQAWSRNVAKVKGDWHLSNHAQAVFGV 
660 670 680 690 700 710 

72 0 730 740 750 760 770 

APHQSHTICTRSDWTGLTNCVEXXITDDKVIASLTKTDXSGXVXLXXXXXXXLXGXAXLX 
MIMMIMMMIMIMM MIMMIMMMI II | | |:| | s , 

APHQSHTICTRSDWTGLTNCVEKTITDDKVIASLTKTDISGNVDLADHAHLNLTGLATLN 
7 20 730 740 750 760 770 

780 790 800 810 820 830 

GNLSANGDTRYTVSHNATQNGNLSLVGNAQATFNQATLNGNXSXSGNASFNLSNNAAQNG 
I H I I I I I I I M M I M I II I I I I I I I I I | | | | | K | | | | | | : | II I II Mll::|:||| 
GNLS ANGDTR YTVSHNATQNGNLS LVGNAQAT FNQATLNGNT SASGNAS FN LS DHAVQNG 
780 790 800 810 820 830 

840 850 860 870 880 890 

SLTLSDNAKANVSHSALNGNVSLADKAVFHFENSRFTGQLSGSKXTALHLKDSEWTLPSG 
Hid I M 1 1 I I M I II I I I M i M M M I I : I I I I I I : I I : I IIIMMMMIMI 
SLTLSGNAKANVSHSALNGNVSLADKAVFHFESSRFTGQISGGKDTALHLKDSEWTLPSG 
840 850 860 870 880 890 
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900 910 920 930 940 
orf la . pep TELGNLNLDNAT ITLN SAYRHDAAGAQTGXVS DTPRRRSRRS LLSVTPPTSVESRFN 

I I I I I 1 1 I I I I I I I I II I I I I I I 1 1 I I I I : : I : I I I I I I I I I II I I I I I I II I I I I 
orf 1-1 TELGNLNLDNATITLNSAYRHDAAGAQTGSATDAPRRRSRRSRRSLLSVTPPTSVESRFN 

900 910 920 930 940 950 

950 960 970 980 990 1000 

or f la . pep TLTVNGKLNXQGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPVSLDQLTVVEG 

I I I I t I I I I I I I I 1 M I I I I I I I I I I I : II , II I I : I! tl I I I I I I I I: I 1:1 I I ! I I I 
orf 1-1 TLTVNGKLNGQGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPASLEQLTVVEG 

960 970 980 990 1000 1010 



1010 1020 1030 1040 1050 1060 

or f la . pep KDNKPLSENLNFT LQNEHVDAGAWRYQL IRKDGEFRLHN PVKEQELS DKLGKAEAKKQAE 

II IMIMMIIM MINI III IMIi M1IIIIIMIIM IIIIMIIUMMIIM 
or f 1 - 1 KDNKPLSENLNFT LQNEHVDAGAWRYQLIRKDGE FRLHN PVKEQELS DKLGKAEAKKQAE 

1020 1030 1040 1050 1060 1070 



1070 1080 1090 1100 1110 1120 

orf la . pep KDNAQS LDAL IAAGRDAAEKTE S VAE PARXAGGENVGIMQAEEEKKRVQADKDSALAKQR 

I I I I I I I I I I I I I I I I I: I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I: I I I I I I 
orf 1-1 KDNAQSLDALIAAGRDAVEKTESVAEPARQAGGENVGIMQAEEEKKRVQADKDTALAKQR 

1080 1090 1100 1110 1120 1130 



1130 1140 1150 1160 1170 1180 

or f la . pep EAETRPXTTAFPRARXARRDLPQPQPQPQPQPQPQRDLXSRYANSGLSEFSATLNSVFAV 
MINI MINIM I I 1 1 I 1 I I Ml MM MM II M I I I I I I I 1 I II I I I 1 1 I 
orf 1-1 EAETRPATTAFPRARRARRDLPQLQPQPQPQP— QRDLISRYANSGLSEFSATLNSVFAV 

1140 1150 1160 1170 1180 1190 



1190 1200 1210 1220 1230 1240 

orf la. pep QDELDRVFAEDRRNAVWTSXIRXTKHYRSQDFRAYRQQTDLRQIGMQKNLGSGRVGILFS 
M I II I 1 1 II I I I I I I II I II I I I II M II I I I M II I I I I II I II I I I II I I I I I M 
orf 1-1 QDELDRVFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQTDLRQIGMQKNLGSGRVGILFS 
1200 1210 1220 1230 1240 1250 



1250 1260 1270 1280 1290 1300 

orf la. pep HNRTENXFDDGIGNSARLAHGAVFGQYGIGRFDIGISTGAGFSSGXLSDGIGGKIRRRVL 
I I 1 I I I r I I I I I I I I I I I f I 1 1 1 I I I I I I II IMIMMIIM i I I I I I I I I I i M I 
orf 1-1 HNRTENTFDDGIGNSARLAHGAVFGQYGIDRFYIGISAGAGFSSGSLSDGIGGKIRRRVL 
1260 1270 1280 1290 1300 1310 



1310 1320 1330 1340 1350 1360 

orf la . pep HYGIQARYRAGFGGFGIEPYIGATRYFVQKADYRYENVNIATPGLAFNRYRAGIKADYSF 
I I I I II II I I I M I I I I I I M I II I I I I I II II I II I I I II I I II I I II I I M I I I II I I 
orf 1-1 HYGIQARYRAGFGGFGIEPHIGATRYFVQKADYRYENVNIATPGLAFNRYRAGIKADYSF 
1320 1330 1340 1350 1360 1370 



1370 1380 1390 1400 1410 1420 

or f la . pep KPAQHXS IT PYXSLS YTDAASGKVRTRVNTAVLAQDFGKTRSAEWGVNAEIKGFTLSXHA 

IMM Mill MMIMMIMIIIMIMMIMIIIIIIMMMMIIIIM II 
or f 1- 1 KPAQHI S IT P YLS LS YT DAASGKVRTRVNTAVLAQ D FGKTRS AEWGVNAE IKGFTLS LHA 

1380 1390 1400 1410 1420 1430 

1430 1440 1450 

orf la .pep AAAKGPQLEAQHSAGIKLGYRWX 
I I I 11 II I I I II I I I I I I II I I I 
orf 1-1 AAAKGPQLEAQHSAGIKLGYRWX 
1440 1450 

Homology with adhesion and penetration protein hap precursor of Kinfluenzae (accession number P45387) 
Amino acids 23-423 of ORF 1 show 59% aa identity with hap protein in 450aa overlap: 



orfl 23 FXAAYLAICLSFGILPQAWAGHTYFGINYQYYRDFAENKGKFAVGAKDIEVYNKKGELVG 82 

F +L C+S GI QAW AGHT YFG I + YQYYRD FAENKGKF VGAK+IEVYNK+G+LVG 
hap 6 FRLNFLTACVSLGIASQAWAGHTYFGIDYQYYRDFAENKGKFTVGAKNIEVYNKEGQLVG 65 

orfl 83 KSMTKAPMI DFSWSRNGVAALVGVQYI VS VAHNGGYNNVDFGAEGXN IXDQXRXTYKI V 142 

SMTKAPMIDFSWSRNGVAALVG QYIVSVAHNGGYN+VDFGAEG N DQ R TY+IV 
hap 66 TSMTKAPMIDFSWSRNGVAALVGDQYIVSVAHNGGYNDVDFGAEGRN-PDQHRFTYQIV 124 
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KRNNY+A + HPY GDYHMPRLHK VT+AEPV MT+ MDG+ Y D+ NYP+RVRIG+GR 



KS DE DE PNNRE S S YH I A 222 

QYWR+D+DE N SSY+++ 

RTDKDEETNVHSSYYVSGAYRYLTAGNTHTQSGNGNGTVNLSGNWSPNHYGPLPTG 244 

SGSPMFIYDAQKQKWLINGVLQTGNPYIGKSNGFQLVRKDWFYDEIFAGDTHSVF 277 
SGSPMFIYDA+K++WLIN VLQTG+P+ G+ NGFQL+R++WFY+E+ A DT SVF 





orfl 


143 


5 


hap 


125 




orfl 


203 




hap 


185 


10 


orfl 


223 




hap 


245 i 


15 


orfl 


278 




hap 


305 * 




orfl 


335 , 


20 


hap 


364 i 




orfl 


394 ' 




hap 


i 

424 i 


25 


Amino acids 7 15-] 




Orfl 


41 




hap 


733 


30 


orfl 


99 i 




hap 


793 1 


35 


orfl 


159 : 




hap 


853 : 




orfl 


219 ( 


40 


hap 


1 

900 ( 




orfl 


279 ] 




hap 


960 ] 


45 


Amino acids 1192 




Orfl 


1 




hap 


1135 


50 


orfl 


61 




hap 


1195 


55 


orfl 


121 




hap 


1255 




orfl 


181 


60 


hap 


1315 




orfl 


241 


65 


hap 


1375 



Y P NG YSF +N+GTGK+ + + + + TV+LFN SL++TA+E V A 
.YIPPINGHYSFVSNNDGTGKLTLTRPSKDGSKAKSEVGTVKLFNPSLNQTAKEHV-KA 

IGVNSYRPRLNNGENISFIDEGKGELILTSNINQGAGGLYFQGDFTV-SPENNETWQGA 
A G N Y+PR+ G+NI D+GKG L + +NINQGAGGLYF+G+F V +NN TWQGA 
AAGYNIYQPRMEYGKNIYLGDQGKGTLTIENNINQGAGGLYFEGNFWKGKQNNITWQGA 

GVH I SEDSTVTWKVNGVANDRLSKIGKGTL 423 
GV I +D+TV WKV+ NDRLSKIG GTL 
GVS IGQDATVEWKVHNPENDRLSKIGIGTL 453 

101 1 of ORF1 show 50% aa identity with hap protein in 258aa overl 

DTRYTVSHNATQ-NGNXSLVXNAQAT FNQ-ATLNGNTSASGNAS FNLS DHAVQNGSLTLS 
DT+ S TQ NG+ +L NA + A LNGN + ++ F LS++A Q G++ LS 
")TKVINS I PITQINGS INLTNNATVN IHGLAKLNGNVTLI DHSQFTLSNNATQTGNIKLS 

INAKANVSHSALNGNVSLADKAVFHFESSRFTGQISGGKDTALHLKDSEWTLPSGXELGN 
+A A V+++ LNGNV L D A F ++S F QI G KDT + L+++ WT+PS L N 



L L+N+T+TLNSAY + S+ +AP • L T PTS E RFNTLTVN 

WSTVTLNSAY SAS SNNAPRHRRS LETETTPTSAEHRFNTLTVN 899 

^GQGTFRFMSELFGYRSDKLKIAESSEGTYTLAVNNTGNEPASLEQLTVVEGKDNKP 278 
GKL+GQGTF+F S LFGY+SDKLKL+ +EG YTL+V NTG EP +LEQLT++E DNKP 



LS+ L FTL+N+HVDAGA 



LDRVFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQTDLRQIGMQKNLGSGRVGILFSHNR 60 
LDR+F + ++AVWT+ +D + Y S FRAY+Q+T+LRQIG+QK L +GR+G +FSH+R 



++NTFD+ + N A L + F QY k R+ ++YG 

SDNTFDEQVKNHATLTMMSGFAQYQWGDLQFGVNVGTGISASKMAEEQSRKIHRKAINYG 

IQARYRAGFGGFGIEPHIGATRYFVQKADYRYENVNIATPGLAFNRYRAGIKADYSFKPA 
+ A Y+ G GI+P+ G RYF+++ +Y+ E V + TP LAFNRY AGI+ DY+F P 

VNASYQFRLGQLGIQPYFGVNRYFIERENYQSEEVRVKTPSLAFNRYNAGIRVDYTFTPT 

QHISITPYLSLSYTDAASGKWTRVNTAVIAQDFGKTRSAEWGVNAEIKGETLSIJIAAAA 
+IS+ PY ++Y D ++ V+T VN VL O FG+ E G+ AEI F +S + + 
DN I S VKPY FFVN YVDVSNANV QTTVNLTVLQQ P FGRYWQKEVGLKAE I LH FQ I SAFI SKS 

KG PQLEAQHS AG I KLGYRW 259 
+G QL Q + G+KLGYRW 
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Homologv with a predicted ORF from Kzonorrhoeae 

The blocks of ORF1 show 83.5%, 88.3%, and 97.7% identities in 467, 298, and 259 aa overlap, 
respectively with a predicted ORF (ORFlng) from N. gonorrhoeae: 



orf 1 . pep MKTTDKRTTETHRKAPKTGRIRFXAAYLAICLSFGILPQAWAGHTYFGINYQYYRDFAEN 60 

I I I I I I I M I I I I I! I I I M I I I I I I I I I I I I I I I I I I I J I I I I I I 1 | f I I I I I I I I 

orflng MKTT DKRTTETHRKAPKTGRIRFS PAYLAI CLS FG I LPQARAGHTYFG INYQYYRDFAEN 60 

orf 1 . pep KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALVGVQYIVSVAHNGGYN 120 

I I I I I I I I I II I 1 I i II I I I I I II I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I 

orflng KGKFAVGAKDIEVYNKKGELVGKSMTBCAPMIDFSWSRNGVAALAGDQYIVSVAHNGGYN 120 

orf 1 . pep NVDFGAEGXNIXDQXRXTYKIVKRNNYKAGTKGHPYGGDYHMPRLHKXVTDAEPVEMTSY 180 

ilium i Hi muumimmiuiimum iiiumuu 

orflng NVDFGAEGSN-PDQHRFSYQIVKRNNYKAGTNGHPYGGDYHMPRLHKFVTDAEPVEMTSY 179 

orf 1 . pep MDGRKYIDQNNYPDRVRIGAGRQYWRSDEDEPNNRESSYHIAS 223 

iii Hi miuiuuuuumimiiimuu 

orflng MDGWKYADLNKYPDRVRIGAGRQYWRSDEDEPNNRE SS YHIASAYSWLVGGNTFAQNGSG 239 

orf 1 . pep GSPMFIYDA QKQKWLI NGVLOTGNPYIGKSNG 255 

I II II 11 I I It III Ul 11 II I I I II II II II 

orflng GGTVNLGSEKIKHSP YGFLPTGGSFGDSGSPMFIYDA QKQKWLIN GVLOTGNPYIGKSNG 289 



orf 1 .pep FQLVRKDWFYDEI FAGDTHSVFYEPRQNGKYS FNDDNNGTGKINAKHEHNSLPNRLKTRT 315 

I I i I I I I I I I I I M I M I I I I I I I J M I M I 111:111:111:111:1 III I I I t t I 
orflng FQLVRKDWFYDEI FAG DTHSVFYEPHQNGKYFFNDNNNGAGKI DAKHKHYSLPYRLKTRT 359 

orf 1 . pep VQLFNVSLSETAREPVYHAAGGVNSYRPRLNNGENISFIDEGKGELILTSNINQGAGGLY 375 

I II II II II II I I I I I II I II II I II II II I II II III I I: II II I III II III II U U 
orflng VQLFNVSLSETAREPVYHAAGGVNSYRPRLNNGENISFIDKGKGELILTSNINQGAGGLY 

orf 1 . pep FQGDFTVSPENNETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGT 422 

I : I : I ill I : I Hi III I III II : I III III II I II II II II II II 
orflng FEGN FTVS PKNNETWQGAGVH I S DGSTVTWKVNGVANDRLS KIGKGTLLVQAKGENQGSV 479 

// 

or f 1 . pep DKVTASLTKTDI SGNVDLADHAHLNLTGLA 744 

III llhlll: III: I III I I I II I II I 
orflng FGVAPHQSHTICTRSDWTGLTSCTEKTITDDKVIASLSKTDVRGNVSLADHAHLNLTGLA 774 

orf 1 . pep T LNGNLSANG DTR- YTVSHNATQNGNXSLVXNAQAT FNQATLNGNT SASGNAS FNL S DHA 803 

Ullll : : : : I I • I I II 11 I III I I I II 1 II I III I I I II I II II II I :: I 
orflng TFNGNL-VQAETRT IRLRANATQNGNLSLVGNAQATFNQATLNGNTSAS DNAS FNLSNNA 833 

orf 1 . pep VQNGSLTLSGNAKANVSHSALNGNVSLADKAVFHFESSRFTGQISGGKDTALHLKDSEWT 8 63 

Ullillll 1 ! I I M I M I I I 1 1 M I I t M f II I I : I I 1 1 1 1 1 M I t M M I I I M i K i 
orflng VQNGSLTLSDNAKANVSHSALNGNVSLADKAVFHFENSRFTGKISGGKDTALHLKDSEWT 893 

or f 1 . pep LPSGXE LGNLNLDNAT IT LNSAYRHDAAGAQTGS ATDAPRRRSRRSRRS LLXVTPPT S VE 923 

I I 1 I ' S i t I I I I I t I I I I I 1 1 I I I I I I 1 I I I t I 1 I :! I I I I I I I E I II 111111:1 
orflng LPSGTELGNLNLDNATITLNSAYRHDAAGAQTGSAADAPRRRSRRS LLSVTPPTSAE 950 

orf 1 . pep SRFNTLTVNGKLNGQGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPASLEQLT 983 

M M I 1 M I II I M 1 1 I I M M I M 1 1 1 I II I I II III II II I II I II II II : 11 III I 
orflng SRFNTLTVNGKLNGQGTFRFMSELFGYRSGKLKLAESSEGTYTLAVNNTGNEPVSLEQLT 1010 

orf 1. pep WEGKDNKPLSENLNFTLQNEHVDAGAW 1011 

Ulilli 1 I I I I 1 1 I I t I I I t t f I I I I 
orflng WEGKDNTPLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKAGET 1070 

// 

or f 1 . pep LDRVFAEDRRNAVWTSGIRDTKHYRSQDFR 1211 

II t II III UMIIMM M I I M II MM 
orflng PQRDLI SRYANSGLSE FSATLNSVFAVQDELDRVFAEDRRNAVWTSGIRDTKHYRSQDFR 1239 

or f 1 . pep AYRQQTDLRQIGMQKNLGSGRVG I LFSHNRTENTFDDGIGN SARLAHGAVFGQYG I DRFY 1271 

I 1 1 I i I I 1 1 I 1 ! I 1 I I I I 1 1 1 I I I I 1 I I t I 1 MM Mill MM M MMIMM II 
orflng AYRQQTDLRQIGMQKNLGSGRVGILFSHNRTGNTFDDGIGNSARLAHGAVFGQYGIGRFD 1299 
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orf 1 .pep IGI SAGAG FS SGS L S DG IGXKXRRRVLH YG I QARYRAGFGGFGIE PH IGATRYFVQKAD Y 1331 

N I I I I I I I I I I I I I I I I I | | | | | | | j | | | | | | | | | | | | | | | | || | | | | | | | | , | | | 
orf lng IGISAGAGFSSGSLSDGIRGKIRRRVLHYGIQARYRAGFGGFGIEPHIGATRYFVQKADY 1359 

orf 1. pep RYENVNIATPGLAFNRYRAGIKADYSFKPAQHI S ITPYLSLSYTDAASGKVRTRVNTAVL 1391 

MINIMUM MMMMMIIIlllliillllllMMliNMIIMMI 

orf lng RYENVNIATPGLAFNRYRAGIKADYSFKPAQHIS ITPYLSLSYTDAASGKVRTRVNTAVL 1419 

orf 1. pep AQDFGKTRSAEWGVNAEIKGFTLSLHAAAAKGPQLEAQHSAGIKLGYRW 1440 

N N N I I I I I I I I I I I I II II II I II I II II || I | | | | | | | | | | | | | | 
orf lng AQDFGKTRSAEWGVNAEIKGFTLSLHAAAAKGPQLEAQHSAGIKLGYRW 1468 

The complete length ORFlng nucleotide sequence was identified <SEQ ID 653>: 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



65 



70 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 
2701 
2751 
2801 
2851 



ATGAAAACAA 
AACCGGCCGC 
TCGGCATTCT 
TACCAATACT 
GGCGAAAGAT 
CGATGACGAA 
GTGGCGGCAT 
CGGCTATAAC 
ACCGCTTTTC 
AACGGCCATC 
TGTCACAGAT 
AATACGCTGA 
AGACAATATT 
ATATCATATT 
CACAAAATGG 
AAACATAGCC 
TGGCTCACCA 
ATGGGGTATT 
CAGCTAGTTC 
CCATTCAGTA 
ACAATAATAA 
CTACCTTATA 
ATCCGAGACA 
GTTATCGACC 
GGAAAAGGTG 
TTTGTATTTT 
GGCAAGGCGC 
GTAAACGGCG 
GCTGGTTCAA 
GTAAAGTCAT 
TTTAGTGAAA 
CGATAATCAG 
GTTTGGATTT 
GATGAAGGGG 
TACCATTACA 
TGGATAGCAA 
GCAACCAAAA 
GGATCGCACT 
CGCAAACAAA 
TACAATCATT 
AGAAATCGTG 
ACTTCCATAT 
GTGGAAGGCG 
CGCACCGCAT 
TGACAAGTTG 
TTGAGCAAGA 
TTTAAATCTC 
GAGACACGCA 
AGCCTCGTGG 
CAACACATCG 
TACAAAACGG 
CATTCCGCAC 
TTTTGAAAAC 
CATTACACTT 
GGCAATTTAA 
ACACGATGCG 
GCCGTTCGCG 
TCCCGTTTCA 



CCGACAAACG 
ATCCGCTTCT 
GCCCCAAGCC 
ATCGCGACTT 
ATTGAGGTTT 
AGCCCCGATG 
TGGCGGGCGA 
AATGTTGATT 
TTACCAAATT 
CTTATGGCGG 
GCAGAACCTG 
TTTAAATAAA 
GGCGGTCTGA 
GCAAGCGCAT 
ATCAGGTGGT 
CATATGGTTT 
ATGTTTATCT 
GCAAACAGGC 
GTAAAGATTG 
TTCTACGAAC 
TGGCGCAGGA 
GATTAAAAAC 
GCAAGAGAAC 
CAGACTGAAT 
AATTGATACT 
GAGGGTAATT 
GGGCGTTCAT 
TGGCAAACGA 
GCCAAAGGGG 
CTTAGATCAG 
TCGGCTTGGT 
TTCAACCCCG 
GAACGGGCAT 
CGATGATTGT 
GGCAATAAAG 
AAAAGAAATT 
CGAACGGGCG 
TTACTGCTTT 
CGGCAAACTG 
TAGGAAGCGG 
TGGGACAACG 
TCAGGGCGGA 
ATTGGCATTT 
CAAAGCCACA 
TACCGAAAAA 
CCGACATCAG 
ACAGGACTTG 
CTATACGGTT 
GCAATGCCCA 
GCTTCGGACA 
CAGTCTGACG 
TCAACGGCAA 
AGCCGCTTTA 
AAAAGACAGC 
ACCTTGACAA 
GCAGGCGCGC 
CCGTTCCCTA 
ACACGCTGAC 



GACAACCGAA 

CGCCCGCTTA 

CGGGCGGGAC 

TGCCGAAAAT 

ACAACAAAAA 

ATTGATTTTT 

TCAATATATT 

TTGGTGCGGA 

GTGAAAAGAA 

CGATTATCAT 

TTGAGATGAC 

TACCCTGATC 

TGAAGACGAA 

ATTCTTGGCT 

GGCACAGTCA 

TTTACCAACA 

ATGATGCCCA 

AACCCCTATA 

GTTCTATGAT 

CACATCAAAA 

AAAATCGATG 

ACGAACCGTT 

CTGTTTATCA 

AATGGAGAAA 

TACCAGCAAC 

TTACGGTCTC 

ATCAGTGATG 

CCGCCTGTCC 

AAAACCAAGG 

CAGGCGGACG 

CAGCGGCAGG 

ACAAACTCTA 

TCGCTTTCGT 

CAACCACAAT 

ATATTACTAC 

GCCTACAACG 

GCTCAATCTG 

CCGGCGGAAC 

TTTTTCAGCG 

GTGGTCAAAA 

ATTGGATCGA 

CAAGCGGTGG 

AAGCAATCAC 

CAATCTGTAC 

ACCATTACCG 

AGGCAATGTC 

CCACACTCAA 

ACGCGCAACG 

AGCAACATTT 

ATGCTTCATT 

CTTTCCGACA 

TGTCTCCCTA 

CCGGAAAAAT 

GAATGGACGC 

CGCCACCATT 

AAACCGGCAG 

TTATCCGTTA 

GGTAAACGGC 



ACACACCGCA 
CTTAGCCATA 
ACACTTATTT 
AAAGGCAAGT 
AGGGGAGTTG 
CTGTGGTATC 
GTGAGCGTGG 
GGGAAGCAAT 
ATAATTATAA 
ATGCCGCGTT 
CAGTTATATG 
GTGTTCGAAT 
CCCAATAACC 
CGTCGGTGGC 
ACTTAGGTAG 
GGAGGCTCAT 
AAAGCAAAAG 
TAGGAAAAAG 
GAAATCTTTG 
TGGGAAATAC 
CCAAACATAA 
CAATTGTTTA 
TGCTGCAGGT 
ATATTTCCTT 
ATCAACCAAG 
GCCTAAAAAC 
GCAGTACCGT 
AAAATCGGCA 
CTCGGTCAGC 
ATCAAGGCAA 
GGGACGGTGC 
TTTCGGCTTT 
TCCACCGCAT 
CAAGACAAAG 
AACCGGCAAT 
GTTGGTTTGG 
AATTACCAAC 
AAATTTAAAC 
GCAGACCGAC 
ATGGAAGGTA 
CCGCACATTT 
TTTCCCGCAA 
GCCCAAGCAG 
ACGTTCGGAC 
ACGATAAAGT 
AGCCTTGCCG 
CGGCAATCTT 
CCACCCAAAA 
AATCAAGCCA 
TAATCTAAGC 
ACGCTAAGGC 
GCCGATAAGG 
CAGCGGCGGC 
TGCCGTCGGG 
ACACTCAATT 
TGCGGCAGAT 
CGCCGCCAAC 
AAATTGAACG 



AAGCCCCTAA 
TGCCTGTdGT 
CGGCATCAAC 
TTGCAGTCGG 
GTCGGCAAAT 
GCGTAACGGC 
CACATAACGG 
CCCGATCAGC 
AGCAGGGACT 
TGCACAAATT 
GATGGGTGGA 
CGGAGCAGGC 
GCGAAAGTTC 
AATACCTTTG 
CGAAAAAATT 
TTGGCGACAG 
TGGTTAATTA 
CAATGGCTTC 
CTGGAGATAC 
TTTTTTAACG 
ACACTATTCT 
ATGTTTCTTT 
GGGGTCAACA 
TATTGACAAA 
GCGCGGGCGG 
AACGAAACGT 
TACTTGGAAA 
AAGGCACGCT 
GTGGGCGACG 
AAAACAAGCC 
AACTGAATGC 
CGCGGCGGAC 
TCAAAATACC 
AATCCACCGT 
AACAACAACT 
CGAGAAAGAT 
CGGAAGAAGC 
GGCAATATCA 
ACCGCACGCC 
TCCCACAAGG 
AAAGCGGAAA 
TGTTGCCAAA 
TTTTCGGTGT 
TGGACGGGTC 
GATTGCTTCA 
ATCACGCTCA 
AGTGCAGGCG 
CGGCAACCTC 
CATTAAACGG 
AACAACGCCG 
AAACGTAAGC 
CAGTATTCCA 
AAGGATACGG 
CACGGAATTA 
CCGCCTATCG 
GCGCCGCGCC 
TTCGGCAGAA 
GTCAGGGAAC 
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2901 

2951 

3001 

3051 

3101 

3151 

3201 

3251 

3301 

3351 

3401 

3451 

3501 

3551 

3601 

3651 

3701 

3751 

3801 

3851 

3901 

3951 

4001 

4051 

4101 

4151 

4201 

4251 

4301 

4351 

4401 



ATTCCGCTTT 

TGGCGGAAAG 

AACGAACCCG 

CACACCGCTG 

atgccggcgc 

CTGCATAATC 

gggagaaACA 

AAcaacaggc 

gCcgggcgca 

GCAGGCAGGC 

AACGGGTGCA 

GAAACCCGGC 

GGATTTGCCG 

TGATCAGCCG 

AACAGCGTTT 

CCGCCGCAAC 

GTTCGCAAGA 

GGTATGCAGA 

CAACCGGACC 

TTGCCCACGG 

GGCATCAGCG 

CAGAGGCAAA 

ACCGCGCAGG 

CGCTATTTCG 

CACCCCGGGC 

ATTCATTCAA 

TCCTATACCG 

CGTATTGGCG 

ACGCCGAAAT 

GGGCCGCAAT 

CTGGTAA 



ATGTCGGAAC 

TTCCGAAGGC 

TAAGTCTCGA 

TCCGAAAATC 

atggCGTTAT 

CGGTCAAAGA 

GAggccgccT 

ggaaaAAGAC 

atgccaccga 

GGGGAAAAtg 

GGCGGATAAA 

CGGCTACCAC 

CAACCGCAGC 

TTATGCCAAT 

TCGCCGTACA 

GCCGTTTGGA 

TTTCCGCGCC 

AAAACCTCGG 

GGAAACACCT 

TGCCGTTTTC 

CGGGCGCGGG 

ATCCGCCGCC 

TTTCGGCGGA 

TCCAAAAAGC 

CTTGCATTCA 

ACCGGCGCAA 

ATGCCGCTTC 

CAGGATTTCG 

CAAAGGTTTC 

TGGAAGCGCA 



TCTTCGGCTA 
ACTTACACCT 
GCAATTGACG 
TTAATTTCAC 
CAGCTTATCC 
ACAAGAGCTT 
TGACGGCAAA 
AACgcgcaaa 
AAAGGCAgaa 
ccgGCATTAT 
GACACCGCCT 
CGCCTTCCCC 
CCCAACCGCA 
AGCGGTTTGA 
GGACGAATTG 
CAAGCGGCAT 
TACCGCCAAC 
CAGCGGGCGC 
TCGACGACGG 
GGGCAATACG 
TTTTAGTAGC 
GCGTGCTGCA 
TTCGGCATCG 
GGATTACCGA 
ACCGCTACCG 
CACATTTCCA 
CGGCAAAGTC 
GCAAAACCCG 
ACGCTGTCCC 
GCACAGCGCG 



CCGCAGCGGC 

TGGCTGTCAA 

GTAGTGGAAG 

CCTGCaaaAc 

gcaaagacgG 

TCCGACAAAC 

ACAGGCacaA 

gccttgAcgc 

agtgttgccg 

GCAGGCGGAG 

TGGCGAAACA 

CGCGCCCGCC 

ACCCCAACCG 

GTGAATTTTC 

GACCGCGTGT 

CCGGGACACC 

AAACCGACCT 

GTCGGCATCC 

CATCGGCAAC 

GCATCGGCAG 

GGCAGCCTTT 

TTACGGCATT 

AACCGCACAT 

TACGAAAACG 

CGCGGGCATT 

TCACGCCTTA 

CGAACGCGCG 

CAGTGCGGAA 

TCCACGCTGC 

GGCATCAAAT 



AAATTGAAGC 

CAATACCGGC 

GAAAAGACAA 

gaacacgtcg 

CGAGTTCCgc 

TCGGCAAGgc 

CTTGCCGCCA 

gctgattgcg 

aaccgGCCCG 

GAAGAGAAAA 

GCGCGAAGCG 

GCGCCCGCCG 

CAGCGCGACC 

CGCCACGCTC 

TTGCCGAAGA 

AAACACTACC 

GCGCCAAATC 

TGTTTTCGCA 

TCGGCACGGC 

GTTCGACATC 

CAGACGGCAT 

CAGGCAAGAT 

CGGCGCAACG 

TCAATATCGC 

AAGGCAGATT 

TTTGAGCCTG 

TCAATACCGC 

TGGGGCGTAA 

CGCCGCCAAG 

TAGGCTACCG 



This is predicted to encode a protein having amino acid sequence <SEQ ID 654>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 



MKTTDKRTTE 
YQYYRDFAEN 
VAALAGDQYI 
NGHPYGGDYH 
RQYWRSDEDE 
KHSPYGFLPT 



THRKAPKTGR 
KGKFAVGAKD 
VSVAHNGGYN 
MPRLHKFVTD 
PNNRESSYHI 
GGSFGDSGSP 



IRFSPAYLAI 
IEVYNKKGEL 
NVDFGAEGSN 
AEPVEMTSYM 
ASAYSWLVGG 
MFIYDAQKQK 



CLSFGILPQA 
VGKSMTKAPM 
PDQHRFSYQI 
DGWKYADLNK 
NTFAQNGSGG 
WLINGVLOTG 



RAGHTYFGIN 
IDFSWSRNG 
VKRNNYKAGT 
YPDRVRIGAG 
GTVNLGSEKI 
NPYIGKSNGF 



QLVRKDWFYD 

LPYRLKTRTV 

GKGELILTSN 

VNGVANDRLS 

FSEIGLVSGR 

DEGAMIVNHN 

ATKTNGGLNL 

YNHLGSGWSK 

VEGDWHLSNH 

LSKTDVRGNV 

SLVGNAQATF 

HSALNGNVSL 

GNLNLDNATI 

SRFNTLTVNG 

NEPVSLEQLT 

LHNPVKEQEL 

AGRNATEKAE 

ETRPATTAFP 

NSVFAVQDEL 

GMQKNLGSGR 

GISAGAGFSS 

RYFVQKADYR 

SYTDAASGKV 

GPQLEAQHSA 



EIFAGDTHSV 

QLFNVSLSET 

INQGAGGLYF 

KIGKGTLLVQ 

GTVQLNADNQ 

QDKESTVTIT 

NYPPEEADRT 

MEGIPQGEIV 

AQAVFGVAPH 

SLADHAHLNL 

NQATLNGNTS 

ADKAVFHFEN 

TLNSAYRHDA 

KLNGQGTFRF 

WEGKDNTPL 

SDKLGKAGET 

SVAEPARQAG 

RARRARRDLP 

DRVFAEDRRN 

VGILFSHNRT 

GSLSDGIRGK 

YENVNIATPG 

RTRVNTAVLA 

GIKLGYRW* 



FYEPHQNGKY 
AREPVYHAAG 
EGNFTVSPKN 
AKGENQGSVS 
FNPDKLYFGF 
GNKDITTTGN 
LLLSGGTNLN 
WDNDWIDRTF 
QSHTICTRSD 
TGLATFNGNL 
ASDNASFNLS 
SRFTGKISGG 
AGAQTGSAAD 
MSELFGYRSG 
SENLNFTLQN 
EAALTAKQAQ 
GEKAGIMQAE 
QPQPQPQPQP 
AVWTSGIRDT 
GNTFDDGIGN 
IRRRVLHYGI 
LAFNRYRAGI 
QDFGBCTRSAE 



FFNDNNNGAG 

GVNSYRPRLN 

NETWQGAGVH 

VGDGKVILDQ 

RGGRLDLNGH 

NNNLDSKKEI 

GNITQTNGKL 

KAENFHIQGG 

WTGLTSCTEK 

VQAETRTIRL 

NNAVQNGSLT 

KDTALHLKDS 

APRRRSRRSL 

KLKLAESSEG 

EHVDAGAWRY 

LAAKQQAEKD 

EEKKRVQADK 

QRDLISRYAN 

KHYRSQDFRA 

SARLAHGAVF 

QARYRAGFGG 

KADYSFKPAQ 

WGVNAEIKGF 



KIDAKHKHYS 

NGENISFIDK 

ISDGSTVTWK 

QADDQGKKQA 

SLSFHRIQNT 

AYNGWFGEKD 

FFSGRPTPHA 

QAWSRNVAK 

TITDDKVIAS 

RANATQNGNL 

LSDNAKANVS 

EWTLPSGTEL 

LSVTPPTSAE 

TYTLAVNNTG 

QLIRKDGEFR 

NAQSLDALIA 

DTALAKQREA 

SGLSEFSATL 

YRQQTDLRQI 

GQYGIGRFDI 

FGIEPHIGAT 

HISITPYLSL 

TLSLHAAAAK 



Underlined and double-underlined sequences represent the active site of a serine protease (trypsin 
family) and an ATP/GTP-binding site motif A (P-loop). 



ORF1-1 and ORFlng show 93.7% identity in 1471 aa overlap: 
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10 20 30 40 50 60 

orf 1-1 . pep MKTTDKRTTETHRKAPKTGRIRFSPAYLAICLSFGILPQAWAGHTYFGINYQYYRDFAEN 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I i I I I t I I I I I I I I 1 I I I I 
orflng-1 MKTTDKRTTETHRKAPKTGRIRFSPAYLAICLSFGILPQARAGHTYFGINYQYYRDFAEN 

10 20 30 40 50 60 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



65 



70 



70 80 90 100 110 120 

orf 1-1 . pep KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALVGDQYIVSVAHNGGYN 

I MM III Ml MIMM MM II MM IM MMM Ml 1 IMM M I I I MINI Ml 
orflng-1 KGKFAVGAKDI EVYNKKGELVGKSMTKAPM I DFS WSRNGVAALAGDQYI VSVAHNGGYN 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 1-1 . pep NVDFGAEGRNPDQHRFTYKIVKRNNYKAGTKGHPYGGDYHMPRLHKFVTDAEPVEMTSYM 
IMMIII II II I 11:1:111111 Mill: III IM I II II I II I Ml I | lllllllll 
orflng-1 NVDFGAEGSNPDQHRFSYQIVKRNNYKAGTNGHPYGGDYHMPRLHKFVTDAEPVEMTSYM 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 1-1. pep DGRKY I DQNN YPDRVRIGAGRQYWRS DE DE PNNRE S S YH I AS AYSWLVGGNTFAQNGSGG 

II II i IM M MMM III! II Ml III IM M I IMMMIM I IM MINIMI 
orflng-1 DGWKYADLNKYPDRVRI GAGRQYWRS DE DE PNNRE S S YHI AS AYSWLVGGNT FAQNGSGG 

190 200 210 220 230 240 

250 260 270 280 290. 300 

or f 1-1 . pep GTVNLGSEKIKHSPYGFLPTGGSFGDSGSPMFIYDAQKQKWLINGVLQTGNPYIGKSNGF 
MINIM MINIMI MINIM Ml Ml MINI | M Ml MM I IM MINI Ml 
orflng-1 GTVNLGSEKIKHSPYGFLPTGGSFGDSGSPMFIYDAQKQKWLINGVLQTGNPYIGKSNGF 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 1-1. pep QLVRKDWFYDEIFAGDTHSVFYEPRQNGKYSFNDDNNGTGKINAKHEHNSLPNRLKTRTV 
IIMIMIIIIIIIIIIIIIIIIhlllll 111:111:111:111:1 (MINIMI 
orflng-1 QLVRKDWFYDE I FAGDTH S VFYEPHQNGKYFFNDNNNGAGKI DAKHKHYS L P YRLKTRTV 

310 320 330 340 350 360 

370 380 390 400 410 420 

or f 1 - 1 . pep QLFNVSLSETAREPVYHAAGGVNS YRPRLNNGENI SFI DEGKGELILTSN INQGAGGLYF 

I MM I II III MMM II Mill I Ml Ml IMMM MMIM MM I IIIMII III 
orflng-1 QLFNVSLSET ARE PVYHAAGGVNS YRPRLNNGENI SFIDKGKGELILTSN INQGAGGLYF 

370 380 390 400 410 420 

430 440 450 460 470 480 

or f 1-1 . pep QGDFTVSPENNETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGTLHVQAKGENQGSIS 
:l:lllll:IMIIIIMM!i: I I I I I I I i I I I II I II I I II I I I IMMIMMM 
orflng-1 EGNFTVSPKNNETWQGAGVHISDGSTVTWKVNGVANDRLSKIGKGTLLVQAKGENQGSVS 

430 440 450 460 470 480 

490 500 510 520 530 540 

or f 1- 1 . pep VGDGTVILDQQADDKGKKQAFSEIGLVSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNGH 
MM I M I M II I: I I I I I I I I I I II I M I I II M II I I M I M M I I M I I I I I I! I I 
orflng-1 VGDGKVILDQQADDQGKKQAFSEIGLVSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNGH 

490 500 510 520 530 540 

550 560 570 580 590 600 

orf 1-1. pep SLSFHRIQNTDEGAMIVNHNQDKESTVTITGNKDIATTGNNNSLDSKKEIAYNGWFGEKD 
M!IIIIIMMMIIIII!IIIIIIIIIIIIIM:ll I I I I : M I I I I I I I I I I I I I I I 
orflng-1 SLSFHRIQNTDEGAMIVNHNQDKESTVTITGNKDITTTGNNNNLDSKKEIAYNGWFGEKD 

550 560 570 580 590 600 

610 620 630 640 650 660 

or f 1-1 . pep TTKTNGRLNLVYQPAAEDRTLLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNHLNDHWSQ 
'•MINIMI III I II I I I I I I I I II I I N I I I I I I I II I I I I I I I II I I : : I I : 
orflng-1 ATKTNGRLNLNYQPEEADRTLLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNHLGSGWSK 

610 620 630 640 650 660 

670 680 690 700 710 720 

orf 1-1 . pep KEG I PRGE I VWDNDW INRT FKAENFQI KGGQAWSRNVAKVKGDWHLSNHAQAVFGVAPH 

IMIMINN IIMMIINN IMMM I NM II I IIMII III III NININI 
orflng-1 MEGI PQGE I VWDNDWI DRT FKAENFHIQGGQAWSRNVAKVEGDWHLSNHAQAVFGVAPH 

670 680 690 700 710 720 
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10 



730 740 750 760 770 780 

orf 1-1 . pep QSHTICTRSDWTGLTNCVEKTITDDKVIASLTICTDISGNVDLADHAHLNLTGLATLNGNL 
(I I I I t I I I I I I I I I : I: I I I I i I I I I I I I I : I I I I I 1 I : I II I I I I I I I I I I I I I II I 
orflng-1 QSHTICTRSDWTGLTSCTEKTITDDKVIASLSKTDIRGNVSLADHAHLNLTGLATLNGNL 

730 740 750 760 770 780 

790 800 810 820. 830 840 

orfl-l.pep SANGDTRYTVS HNATQNGNLS LVGNAQAT FNQAT LNGNTS ASGNAS FNLSDHAVQNGSLT 

11:111:1 M::l I IN I llll I M I MiMMI I Mil i I I I I i I I I I : : I I M i I II 
orflng-1 SAGGDTHYTVTRNATQNGNLS LVGNAQAT FNQAT LNGNTSASDNAS FNLSNNAVQNGS LT 

790 800 810 820 830 840 
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850 860 870 880 890 900 

orf 1-1 . pep LSGNAKANVSHSALNGNVSLADKAVFHFESSRFTGQISGGKDTALHLKDSEWTLPSGTEL 

II HUM M Ml II MINIMI II II: MM 1:11 M II I Ml II MM I I MUM 
orflng-1 LSDNAKANVSHSALNGNVSLADKAVFHFENSRFTGKISGGKDTALHLKDSEWTLPSGTEL 

850 860 870 880 890 900 

910 920 930 940 950 960 

or f 1-1 . pep GNLNLDNATITLNSAYRHDAAGAQTGSATDAPRRRSRRSRRSLLSVTPPTSVESRFNTLT 
I M I I I I I I 1 I ! M I I I I I M I I I I I I I : I M I I I I I I I I I I I H I I I : I I II I I I I 
orflng-1 GNLNLDNATITLNSAYRHDAAGAQTGSAADAPRRRSR RSLLSVT P PT S AE SRFNTLT 

910 920 930 940 950 

970 980 990 1000 1010 1020 

or f 1-1 . pep VNGKLNGQGT FRFMSELFGYRS DKLKLAE S SEGTYTLAVNNTGNEPAS LEQLTWEGKDN 

I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I II M I I I I : I I I I I I I I I I I I I 
orflng-1 VNGKLNGQGTFRFMSELFGYRSGKLKLAESSEGTYTLAVNNTGNEPVSLEQLTWEGKDN 

960 970 980 990 1000 1010 

1030 1040 1050 1060 1070 

orf 1-1 . pep KPLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKA 

II II HIM I MUM I lllllll II Ml MUM Mill I I Ml M II 
orflng-1 TPLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKAGETEAALTAK 
1020 1030 1040 1050 1060 1070 

1080 1090 1100 1110 1120 

or f 1 - 1 . pep EAKKQAEKDNAQS LDALI AAGRDAVEKTES VAE PARQAGGENVG IMQAEEEKKRVQ 

I I : ! M I M i M I I I t I I t I I : I : I I : I I I I I I I I I I I M ) : I 1 I I i I t I II M I 
orflng-1 QAQLAAKQQAEKDNAQS LDALI AAGRNATEKAES VAE PARQAGGENAG IMQAEEEKKRVQ 

1080 1090 1100 1110 1120 1130 

1130 1140 1150 1160 1170 1180 

orf 1-1 . pep ADKDTALAKQREAETRPATTAFPRARRARRDLPQLQPQPQPQPQRDLISRYANSGLSEFS 
I I I I I I I I I I I I II I I I I I I II I I I I II I I I M I I II I I I I I I I I I I I II I II J I I I I I 
orflng-1 ADKDTALAKQREAETRPATTAFPRARRARRDLPQPQPQPQPQPQRDLISRYANSGLSEFS 
1140 1150 1160 1170 1180 1190 

1190 1200 1210 1220 1230 1240 

orfl-l.pep ATLNSVFAVQDELDRVFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQTDLRQIGMQKNLG 
I I I I I M I M I I M II I II M I I I I M I I I I I I M I I I I II I II I I M I II ! I 11 M I I I 
orflng-1 ATLNSVFAVQDELDRVFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQTDLRQIGMQKNLG 
1200 1210 1220 1230 1240 1250 

1250 1260 1270 1280 1290 1300 

orf 1-1 . pep SGRVGILFSHNRTENTFDDGIGNSARLAHGAVFGQYGIDRFYIGISAGAGFSSGSLSDGI 

III I I I I I I I I I I I I M I II II I I I I I II 1 I I M I I I II I I I I M I I I 1 1 II { I II I 
orflng-1 SGRVGILFSHNRTGNTFDDGIGNSARLAHGAVFGQYGIGRFDIGISAGAGFSSGSLSDGI 

1260 1270 1280 1290 1300 1310 

1310 1320 1330 1340 1350 1360 

or f 1-1 . pep GGKI RRRVLHYG I QARYRAGFGGFG I E PH IGATRYFVQKADYRYENVN IAT PGLAFNRYR 

II 1 1 1 1 1 I 1 1 I 1 1 1 1 1 1 1 1 I II 1 1 I I HI 1 1 IN 1 1 M II 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 I 
orflng-1 RGKIRRRVLHYGI QARYRAGFGGFG I EPHI GATRYFVQKADYRYENVN IAT PGLAFNRYR 

1320 1330 1340 1350 1360 1370 

1370 1380 1390 1400 1410 1420 

orf 1-1 . pep AGIKADYS FKPAQH I S IT P YLS LS YT DAAS GKVRTRVNTAVLAQDFGKTRS AEWGVNAEI 

I I I I II I H I II II II II II Hi I I I I II II I I I I I I I II II I II I II II II I I I I II H 
orflng-1 AGIKADYSFKPAQHISITPYLSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEWGVNAEI 
1380 1390 1400 1410 1420 1430 
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1430 1440 1450 

orf 1-1 . pep KGFTLSLHAAAAKGPQLEAQHSAGIKLGYRWX 
1 1 I 1 1 1 II 1 1 II II M II li I II M II 1 1 II I 
orf lng-1 KG FT L S LHAAAAKG PQLEAQH SAG I KLG YRWX 

1440 1450 1460 



In addition, ORFlng shows 55.7% identity with hap protein (P45387) over a 1455aa overlap: 

SCORES Initl: 1104 Initn: 4632 Opt: 2680 

Smith-Waterman score: 5165; 55.7% identity in 1455 aa overlap 

10 20 30 40 50 60 

MKTTDKRTTETHRKAPKTGRIRFS PAYLAICLS FGILPQARAGHTYFGINYQYYRDFAEN 

I :l: 1:1:11: II I I I I I I I I : I I I I I I I I I I 
MKKTVFRLNFLT ACI S LGI VSQAWAGHT YFG I DYQYYRDFAEK 
10 20 30 40 

70 80 90 100 110 120 

KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALAGDQYIVSVAHNGGYN 
II I |:|||::|:MI 1:1:1 II I I I I I I I I I I I I I I I II I ! I I : :|||llllll II: 
KGKFTVGAQNIKVYNKQGQLVGTSMTKAPMIDFSWSRNGVAALVENQYIVSVAHNVGYT 
50 60 70 80 90 100 

130 140 150 160 170 , 180 

NVDFGAEGSNPDQHRFSYQIVKRNNYKAGTNGHPYGGDYHMPRLHKFVTDAEPVEMTSYM 
: M II I I I : I I I I II I : I : I II I I I I I I III ill I I I I II I I : I I : : M I I 
DVDFGAEGNNPDQHRFTYKIVKRNNYKKD-NLHPYEDDYHNPRLHKFVTEAAPIDMTSNM 
110 120 130 140 150 160 

190 200 210 220 230 240 

DGWKYADLNKYPDRVRIGAGRQYWRSDEDEPNNRESSYHIASAYSWLVGGNTFAQNGSGG 
:| hi :|||:l Ml 1:1 I I: I I: 1:1: : ::|:|| :|::||| I |:|: 

NGSTYSDRTKYPERVRIGSGRQFWRNDQDKGD QVAGAYHYLTAGNTHNQRGAGN 

170 180 190 200 210 

250 260 270 280 290 300 

GTVNLGSEKIKHSPYGFLPTGGSFGDSGSPMFIYDAQKQKWLINGVLQTGNPYIGKSNGF 

I II:: I : II II : I I I I I I I I I I I I I I : I I I I I I I I : I : III: II 111 
GYSYLGGDVRKAGEYGPLPIAGSKGDSGSPMFIYDAEKQKWLINGILREGNPFEGKENGF 

220 230 240 250 260 270 

310 320 330 340 350 360 

QLVRKDW FYDE I FAGDTHS VFYEPHQNGKYFFN DNNNGAGK I DAKHKH YS LPYRLKTRT V 

II I II:: I III! I I: :| II I :: I: I I |:| I ::| ::| : 
QLVRKS YF-DE IFERDLHTSLYTRAGNGVYT I SGNDNGQGS ITQKS GIPSEIK 1 

280 290 300 310 320 

370 380 390 400 410 419 

QLFNVSLSETAREPVYHAA-GGVNSYRPRLNNGENISFIDKGKGELILTSNINQGAGGLY 
11:11 :: I:: I I I 1 1 I I I I I : : I : I : : I II :: I : I I I II I I 1 1 
TLANMSLPLKEKDKVHNPRYDGPNIYSPRLNNGETLYFMDQKQGSLIFASDINQGAGGLY 
330 340 350 360 370 380 

420 430 440 450 460 470 479 

orf lng-1 . pep FEGN FTVS PKNNETWQGAGVHI SDGSTVTWKVNGVANDRLSKIGKGTLLVQAKGENQGSV 
I I I I I I I I I :: I : I I I I I I : I : I :: I I M I I I I I I : I I I I I I I I I II I I I I I II : I I : 
p45387 FEGNFTVSPNSNQTWQGAGIHVSENSTVTWKVNGVEHDRLSKIGKGTLHVQAKGENKGSI 
390 400 410 420 430 440 

480 490 500 510 520 530 539 

1 . pep SVGDGKVILDQQADDQGKKQAFSEIGLVSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNG 
I I I I I INI: II II I I I: I Ml I III Ml II I I M II MM: I i : I I I I I M I I I I I I 
SVGDGKVILEQQADDQGNKQAFSEIGLVSGRGTVQLNDDKQFDTDKFYFGFRGGRLDLNG 
450 460 470 480 490 500 

540 550 560 570 580 590 

orf lng-1 . pep HSLSFHRIQNTDEGAMIVNHNQDBCESTVTITGNKDITT-TGNN-NNLDSKKEIAYNGWFG 
I I I : I : M I I I I M II I I I I I : ::IMIII::|: MM MM MMIIIMM 
p45387 HS LT FKRIQNT DEGAMI VNHNTTQAANVT I TGNE S I VLPNGNN INKLDYRKEI AYNGWFG 

510 520 530 540 550 560 



orf lng-1. pep 
p45387 

orf lng-1. pep 
p45387 

orf lng-1. pep 
p45387 

orf lng-1. pep 
p45387 

orf lng-1. pep 
p45387 

orf lng-1. pep 
p45387 

orf lng-1. pep 
p45387 



orflng- 
p45387 
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600 610 620 630 640 650 

or f lna- 1 . pep EKDATKTNGRLNLNYQPEEADRTLLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNHLGSG 

9 | | : | | | | I M I : I 1 1 1 I I 1 1 I I I I 1 : 1 : 1 I I I : I I I I I I I I I I I 1 1 I I 1 1 " 

©45387 ETDKNKHNGRLNLIYKPTTEDRTLLLSGGTNLKGDITQTKGKLFFSGRPTPHAYNHLNKR 

y 570 580 590 600 610 620 

660 670 680 690 700 710 

orf lng-1. pep WSKMEGI PQGEIVWDNDWI DRT FKAENFHIQGGQAVVSRNVAKVEGDWHLSNHAQAVFGV 
* ||:| till MM Ml: I I IMMMMIMM hi I I ill l:::M:l ' MM: I: Ml 

P45387 WSEMEGIPQGEIVWDHDWINRTFKAENFQIKGGSAWSRNVSSIEGNWTVSNNANATFGV 
630 640 650 660 670 680 

720 730 740 750 760 770 

orf lng-1 . pep APHQSHTICTRSDWTGLTSCTEKTITDDKVIASLSKTDIRGNVSLADHAHLNLTGLATLN 
:IM::MMIIIMMI:I : Ml Ml I: MM I::: I: I: I I: Ml M 
p4 5 3 8 7 VPNQQNT ICTRS DWTGLTTCQKVDLT DTKVINS I PKTQINGS INLTDNATANVKGLAKLN 

690 700 710 720 730 740 

780 790 800 810 820 830 

or f lna-1 . pep GNLSAGGDTHYTVTRNATQNGNLSLVGNAQAT FNQATLNGNTSAS DNAS FNLSNNAVQNG 
9 II:: : : : : M : I M M : I I 

D45387 GNVTL TNHSQFTLSNNATQIG 

750 7 60 770 

840 850 860 870 880 890 

orflna-l.pep SLTLSDNAKANVSHSALNGNVSLADKAVFHFENSRFTGKISGGKDTALHLKDSEWTLPSG 

: : | | | | : I : I : : : I II II I M M I : : M : I : : I M I I x : I : : : II : II 
©45387 NIRLSDNSTATVDNANLNGNVHLTDSAQFSLKNSHFSHQIQGDKGTTVTLENATWTMPSD 

780 790 800 810 820 830 

900 910 920 930 940 950 

orf lng-1 . pep TELGNLNLDNATITLNSAYRHDAAGAQTGSAADAPRRRSRRSLLSVTPPTSAESRFNTLT 
^ " ^ | | M M M : I I M I I I i s s I I I I I I : I Mill MM II 

P 45387 TTLQNLTLNNSTITLNSAY S AS SNNT PRRRS LETETTPTSAEHRFNTLT 

840 850 860 870 

960 970 980 990 1000 1010 

orf lng-1 . pep VNGKLNGQGTFRFMSELFGYRSGKLKLAESSEGTYTLAVNNTGNEPVSLEQLTWEGKDN 
MM I: MM Ml I MUM MM:::: II I IM I MM I Ml I MM Mill 
p45387 VNGKLSGQGTFQFTSSLFGYKSDKLKLSNDAEGDYILSVRNTGKEPETLEQLTLVESKDN 
880 890 900 910 920 930 

1020 1030 1040 1050 1060 1070 

orf lng-1 . pep T PLSENLNFTLQNEHVDAGAWRYQLI RKDGE FRLHN PVKEQELS DKLGKAGETEAALTAK 
II I :MM MMM II I II MM: :M MUM II MUM : I M :M M M 
P 45387 QPLS DKLKFT LENDHVDAGALRYKLVKN DGE FRLHN P IKEQELHN DLVRAEQAERTLEAK 

940 950 960 970 980 990 

1080 1090 1100 1110 1120 1130 

orf lng-1. pep QAQLAAKQQAEKDNAQSLDALIAAGRNAT-EKAESVAEPARQAGGENAGIMQAEEEECKRV 
I : : : I I I : : : : : I I II : : : : : I MM M : : : : : I : I 
P45387 QVEPTAKTQTGEPKVRSRRAARAAFPDTLPDQSLI^ALEAKQAE-LTAETQKSKAKTKKV 
* ^ 1000 1010 1020 1030 1040 1050 

1140 1150 1160 1170 1180 1190 

or f lno-1 . pep QADK DTALAKQREAETRPATTAFPRARRARRD-LPQPQPQPQPQPQRDLISRYANSG 

: : : : | | : I : : : : : : : I I I : : I : I M M I 1 1 : 1 I : 

p45387 RSKRAVFSDPLLDQSLFALEAALEVIDAPQQSEKDRLAQEEAEKQ-RKQKDLISRYSNSA 
1060 1070 1080 1090 1100 1110 

1200 1210 1220 1230 1240 1250 

orf lng-1 . pep LSEFSATLNSVFAVQDELDRVFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQ-TDLRQIG 
MI:MI:M:::IIMIII:|::: ::MM: M : M I : II M : II I M II II 
p45387 LSELSATVNSMLSVQDELDRLFVDQAQSAVWTNIAQDKRRYDSDAFRAYQQQKTNLRQIG 
1120 1130 1140 1150 1160 1170 

1260 1270 1280 1290 1300 1310 

orf lng-1 . pep MQKNLGSGRVGILFSHNRTGNTFDDG IGNSARLAHGAVFGQYG IGRFDIGI SAGAG FS S G 
Ml I:: 1 1: I MM: I: MM: : I I I: : I: I I I : : : I : : : I : I : I : : 
p4 5387 VQKALANGRIGAVFSHSRSDNTFDEQVKNHATLTMMSGFAQYQWGDLQFGVNVGTGISAS 
1180 1190 1200 1210 1220 1230 
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1320 1330 1340 1350 1360 1370 

orf lng-1 . pep S LS DG IRGKI RRRVLH YGI QARYRAGFGGFG IE PH IGATRYFVQKADYRYENVN IAT PGL 

I: :| : || : | : : | : : | | | : : : : j: |:| : ||:| 
P45387 KMAEEQSRKIHRKAINYGVNASYQFRLGQLGIQPYEX3VNRYFIERENYQSEEVRVKTPSL 
1240 1250 1260 1270 1280 1290 

1380 1390 1400 1410 1420 1430 

orf lng-1 . pep AFNRYRAGIKADYSFKPAQHISITPYLSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEW 

Mill lll::||:| |:::||: II: : : | : | : : : : : | : | II :|| | It: : I 
p4 5387 AFNRYNAGIRVDYTFTPTDNISVKPYFFVNYVDVSNANVQTTVNLTVLQQPEX3RYWQKEV 
1300 1310 1320 1330 1340 1350 

1440 1450 1460 1469 

orf lng-1 . pep GVNAE IKG FT LSLHAAAAKGPQLEAQHSAGI KLG YRWX 

1:1 : ::| II |:::|:||t||| 
p45387 GLKAEILHFQISAFISKSQGSQLGKQQNVGVKLGYRW 
1360 1370 1380 1390 

Based on this analysis, it is predicted that these proteins from N.meningitidis and N.gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 78 

i 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 655>: 

1 . . AAGGTGTGGC AATTTGTCGA AGA . CCGCTG CGTGCCGTCG TGCCTGCCGA 

51 CAGTTTTGAA CCGACCGCGC AAAAATTGAA CCTGTTTAAG GCGGGTGCGG j 

101 CAACCATTTT GTTTTATGAA GATCAAAATG TCGTCAAAGG TTTGCAGGAG 

151 CAGTTCCCTG CTTATGCCGC TAACTTCCCC GTTTGGGCGg ATCAGGCAAA . 

201 CGCGATGGTG CAGTATGCCG TTTGGACGAC ACTTGCCGCG GTCGGCGTAG 1 

251 GTGCAAACCT GCAACATTAC AATCCCTTGC CCGATGCGGC GATTGCCAAA 

301 GCGTGGAATA TCCCCGAAAA CTGGTTGTTG CGCGCACAAA TGGTTATCGG 

351 CGGTATTGAA GGGGCGGCAG GTGAAAAGAC CTTTGAACCC GTTGCAGAAC 

401 GTTTGAAAGT GTTCGGCGCA TAA 

This corresponds to the amino acid sequence <SEQ ID 656; ORF6>: 

1 ..KVWQFVEXPL RAW PADS FE PTAQKLNLFK AGAATILFYE DQNWKGLQE 
51 QFPAYAANFP VWADQANAMV QYAVWTTLAA VGVGANLQHY NPLPDAAIAK 
101 AWNIPENWLL RAQMVIGGIE GAAGEKTFEP VAERLKVFGA * 

Further sequence analysis revealed a further partial DNA sequence <SEQ ID 657>: 

1 . . CTGCGTGCCG TCGTGCCTGC CGACAGTTTT GAACCGACCG CGCAAAAATT 

51 GAACCTGTTT AAGGCGGGTG CGGCAACCAT TTTGTTTTAT GAAGATCAAA 

101 ATGTCGTCAA AGGTTTGCAG GAGCAGTTCC CTGCTTATGC CGCTAACTTC 

151 CCCGTTTGGG CGGATCAGGC AAACGCGATG GTGCAGTATG CCGTTTGGAC 

201 GACACTTGCC GCGGTCGGCG TAGGTGCAAA CCTGCAACAT TACAATCCCT 

251 TGCCCGATGC GGCGATTGCC AAAGCGTGGA ATATCCCCGA AAACTGGTTG 

301 TTGCGCGCAC AAATGGTTAT CGGCGGTATT GAAGGGGCGG CAGGTGAAAA 

351 GACCTTTGAA CCCGTTGCAG AACGTTTGAA AGTGTTCGGC GCATAA 

This corresponds to the amino acid sequence <SEQ ID 658; ORF6-l>: 

1 . .LRAWPADSF EPTAQKLNLF KAGAATILFY EDQNWKGLQ EQFPAYAANF 
51 PVWADQANAM VQYAVWTTLA AVGVGANLQH YNPLPDAAIA KAWNIPENWL 
101 LRAQMVIGGI EGAAGEKTFE PVAERLKVFG A* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.meninzitidis (strain 

ORF6 shows 98.6% identity over a 140aa overlap with an ORF (ORF6a) from strain A of N. 
meningitidis: 

10 20 30 
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KVWQFVEXPLRAWPADSFEPTAQKLNLFK 

1 1 I t I I 1 I I I 1 1 I I I I I I f 1 1 I 1 I I 1 t I 
QIVEHAVLHTPSSFNSQSARWVLFGEEHDKVWQFVEDALRAWPADSFEPTAQKLNLFK 

40 50 60 70 80 90 

40 50 60 70 80 90 

AGAATILFYEDQNWKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGVGANLQHY 

t M | 1 1 1 I I I I I I I I I I I I 1 1 1 I I II 1 1 t t I i tl 1 I I I I 1 1 1 1 1 1 1 1 I I 1 i 1 1 I I i 1 1 I I 
AGAATILFYEDQNWKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGVGANLQHY 

100 110 120 130 140 150 

100 110 120 130 140 

NPLPDAAIAKAWNIPENWLLRAQMVIGGIEGAAGEKTFEPVAERLKVFGAX 

MM III IIMM I! II II HIM MUM I II Mil I I I M Ml M Ml I 
NPLPDAAIAKAWNIPENWLLRAQMVIGGIEGAAGEKTFEPVAERLKVFGAX 
160 170 180 190 200 

The complete length ORF6a nucleotide sequence <SEQ ID 659> is: 

1 ATGACCCGTC AATCTCTGCA ACAGGCTGCC GAAAGCCGCC GTTCCATTTA 

51 TTCGTTAAAT AAAAATCTGC CCGTCGGCAA AGATGAAATC GTCCAAATCG 

101 TCGAACACGC CGTTTTGCAC ACACCTTCTT CGTTCAATTC CCAATCTGCC 

151 CGTGTGGTCG TGCTGTTTGG CGAAGAGCAT GATAAGGTGT GGCAATTTGT 

201 CGAAGACGCG CTGCGTGCCG TCGTGCCTGC CGACAGTTTT GAACCGACCG 

251 CGCAAAAATT GAACCTGTTT AAGGCGGGTG CGGCAACTAT TTTGTTTTAT 

301 GAAGATCAAA ATGTCGTCAA AGGTTTGCAG GAGCAGTTCC CTGCTTATGC 

351 CGCCAACTTT CCCGTTTGGG CGGACCAGGC GAACGCGATG GTGCAGTATG 

401 CCGTTTGGAC GACACTTGCC GCGGTCGGCG TAGGTGCAAA CCTGCAACAT 

451 TACAATCCCT TGCCCGATGC GGCGATTGCC AAAGCGTGGA ATATCCCCGA 

501 AAACTGGTTG TTGCGCGCAC AAATGGTTAT CGGCGGTATT GAAGGGGCGG 

551 CAGGTGAAAA GACCTTTGAA CCAGTTGCAG AACGTTTGAA AGTGTTCGGC 

601 GCATAA 

This is predicted to encode a protein having amino acid sequence <SEQ ID 660>: 

1 MTRQSLQQAA ESRRSIYSLN KNLPVGKDEI VQIVEHAVLH TPSSFNSQSA 

51 RVWLFGEEH DKVWQFVEDA LRAWPADSF EPTAQKLNLF KAGAATILFY 

101 EDQNWKGLQ EQFPAYAANF PVWADQANAM VQYAVWTTLA AVGVGANLQH 

151 YNPLPDAAIA KAWNIPENWL LRAQMVIGGI EGAAGEKTFE PVAERLKVFG 

201 A* 



orf 6. pep 
orf6a 

orf 6. pep 
orf6a 

orf 6. pep 
orf 6a 



ORF6a and ORF6-1 show 100.0% identity in 131 aa overlap: 

50 60 70 80 90 100 

or f 6a . pep TPSSFNSQSARVWLFGEEHDKVWQFVEDALRAWPADSFEPTAQKLNLFKAGAATILFY 

M II I M I I M I It I III 1 1 1 1 I I M MM 
orf 6-1 LRAW PAD S FE PTAQKLNL FKAGAAT I LFY 

10 20 30 

110 120 130 140 150 160 

orf 6a . pep EDQNVVKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGVGANLQHYNPLPDAAIA 
I || 1 1 1 1 I 1 1 \ I I I 1 1 I I I I II I I I 1 1 II II I 1 1 II I I II M 1 1 I I 1 1 I I I I I I I II II I 
orf 6-1 E DQN WKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGVGANLQHYN PLPDAAIA 

40 50 60 70 80 90 

170 180 190 200 

orf 6a. pep KAWN I PENWLLRAQMV I GG I EGAAGEKTFE PVAERLKVFG AX 
Mill IIMM II I MIMM MMMM MIIMM I I III 
orf 6-1 KAWN I PENWLLRAQMV I GG I EGAAGEKTFEPVAERLKVFGAX 

100 110 120 130 

Homology with a predicted ORF from ^gonorrhoeae 

ORF6 shows 95.7% identity over a 140aa overlap with a predicted ORF (ORF6ng) from 
N. gonorrhoeae: 
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orf 6 . pep KVWQFVEX PLRAW PADS FE PTAQKLNL FK 30 

I II II II I II I II I II II I II l-l 1:1 II 
orf6ng SNVS LDMSN PTVLRMGLP LY IAS LRRGAI YKVWQFVE DALRAWPADS FE PTAQKLKL FK 64 

5 or f 6. pep AGAATI L FYE DQNWKGLQEQFPAYAAN FPVWADQANAMVQYAVWTT LAAVGVGANLQH Y 90 

II Mllllllllll Ml I II II lllll I li MM II Mill INI III I 1 1 1:11 I MM 
or f 6ng AGAAT XL FYE DQNWKGLQEQFPAYAAN FPVWADQANAMVQYAVWTT LAAVGAGANLQHY 124 

orf 6 - pep NPLPDAAIAKAWN I PENWLLRAQMVIGGIEGAAGEKTFE PVAERLKVFGA 140 

10 I I I II : I I I II I M I I I I I I I I I M I I I I I I I I I I I : M I I I I I I I I I II 

orf6ng NPLPDVAIAKAWNIPENWLLRAQMVIGGIEGAAGEKVFE PVAERLKVFGA 174 

The complete length ORF6ng nucleotide sequence <SEQ ID 661> was identified as: 

1 ATGGCCGTTG CGTCAAATGT CAGCTTGGAT ATGTCCAATC CTACGGTGTT 

15 51 ACGCATGGGA TTACCCTTAT ATATTGCGTC CCTAAGAAGG GGCGCAATAT 

101 ATAAGGTGTG GCAATTTGTC GAAGACGCGC TGCGTGCCGT CGTGCCTGCC 

151 GACAGTTTTG AACCGACCGC GCAAAAATTG AAGCTGTTTA AGGCGGGCGC 

201 GGCAACCATT TTGTTTTATG AAGATCAAAA TGTCGTCAAA GGTTTGCAGG 

251 AGCAGTTCCC TGCTTATGCC GCCAACTTTC CCGTTTGGGC GGACCAGGCG 

20 301 AACGCTATGG TACAGTATGC CGTCTGGACG ACACTTGCCG CGGTCGGTGC 

351 AGGTGCAAAT CTGCAACATT ACAACCCCTT GCCCGATGTG GCGATTGCTA 

401 AAGCGTGGAA TATTCCCGAA AACTGGCTGT TGCGCGCGCA AATGGTTATC 

451 GGTGGTATTG AAGGGGcggc aggtgaaaaa gtctttgaac CCGTTGCgga 

501 acgtttgAAA GTGTTCGGCG CATAA 1 

25 This encodes a protein having amino acid sequence <SEQ ID 662>: 

1 MAVASNVSLD MSNPTVLRMG LPLYIASLRR GAIYKVWQFV E DAL RAW PA j 
51 DSFEPTAQKL KLFKAGAATI LFYEDQNWK GLQEQFPAYA ANFPVWADQA 
101 NAMVQYAVWT TLAAVGAGAN LQHYNPLPDV AIAKAWNIPE NWLLRAQMVI 
151 GGIEGAAGEK VFEPVAERLK VFGA* 1 

30 

ORF6ng and ORF6-1 show 96,9% identity in 13 1 aa overlap: 

10 20 30 

orf 6-1 . pep LRAWPADS FEPTAQKLN LFKAGAAT I LFY 

M I M I I I II I I I I I I I : If I I 1 I I I M I I 
35 orf6ng PT VLRMGL PLYI AS LRRGAI YKVWQFVE DALRAWPADS FE PTAQKLKL FKAGAAT I LFY 

20 30 40 50 60 70 

40 50 60 70 80 90 

orf 6-1 . pep EDQNWKGLQEQFPAYAAN FPVWADQANAMVQYAVWTTLAAVGVGANLQH YNPL P DAAI A 
40 I I I II I I I I I I I I I II I I I I I I I I M I I I I I I I I I I I I I I I II : I I I I I I I I I | I I : I || 

orf 6ng E DQNWKGLQEQFPAYAAN FPVWADQANAMVQYAVWTTLAAVGAGANLQHYN PLPDVAIA 

80 90 100 110 120 130 

100 110 120 130 

45 orf 6-1. pep KAWN I PENWLLRAQMVIGGI EGAAGEKT FE PVAERLKVFGAX 

1 1 1 1 1 1 i 1 1 1 1 i 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 ^ i 1 1 1 J 1 1 1 1 1 i 1 1 i 

orf 6ng KAWN I PENWLLRAQMVIGGI EGAAGEKVFE PVAERLKVFGAX 

140 150 160 170 

50 It is predicted that the proteins from N.meningitidis and Kgonorrhoeae, and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 79 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 663> 

1 ..GGCTACAACT ACCTGTTCGC GCGCGGCAGC CGCATCGCCA ACTACCAAAT 

55 51 CAACGGCATC CCCGTTGCCG ACGCGCTGGC CGATACGGGJ CAATGCCAAC 

101 ACCGCCGCCT ATGAGCGCGT AGAAGTCGTG CGCGGCGTGG CGGGGCTGCT 

151 GGACGGCACG GGCGAGCCTT CCGCCACCGT CAATCTGGTG CGCAAACGCC 

201 TGACCCGCAA GCCATTGTTT GAAGTCCGCG CCGAAGCgGG CAACCGcAAA 
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251 CATTTCGGGC TGGACGCGGA CGTATCGGGC AGCCTGAACA CCGAAG.crC 

301 rCTGCGCgGC CGCCTGGTTT CCAcCTTCGG ACGCGGCGAC TCGTGGCGGC 

351 GGCGCGAACG CAGCCGskAT GCCGAACTCT ACGGCATTTT GGAATACGAC 

401 ATCGCACCGC AAACCCGCGT CCACGCArGC ATGGACTACC AGCAGGCGAA 

451 AGAAACCGCC GACGCGCCGC TCAGcTACGC CGTGTACGAC AGCCAAGGTT 

501 ATGCCACCGC CTTCGGCCCG AAAGACAACC CCGCCACAAA TTGGGCGAAC 

551 AGCCACCACC GTGCGCTCAA CCTGTTCGCC GGCATCGAAC ACCGCTTCAA 

601 CCAAGACTGG AAACTCAAAG CCGAATACGA CTAC. . 

This corresponds to the amino acid sequence <SEQ ID 664; ORF23>: 



1 . . GYNYLFARGS RIANYQINGI PVADALADTG NANTAAYERV EWRGVAGLL 

51 DGTGEPSATV NLVRKRLTRK PLFEVRAEAG NRKHFGLDAD VSGSLNTEXX 

101 LRGRLVSTFG RGDSWRRRER SRXAELYGIL EYDIAPQTRV HAXMDYQQAK 

151 ETADAPLSYA VYDSQGYATA FGPKDNPATN WANSHHRALN LFAGIEHRFN 

201 QDWKLKAEYD Y. . 

Further work revealed the complete nucleotide sequence <SEQ ID 665>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 



ATGACACGCT 

CGCGCAGGCC 

CTGAATTGCC 

GACGGCTACA 

CCTGCGCGAA 

GCGACCAAAA 

ACCAGCCGCC 

CGCGCGCGGC 

CCGACGCGCT 

GTAGAAGTCG 

TTCCGCCACC 

TTGAAGTCCG 

GACGTATCGG 

TTCCACCTTC 

ATGCCGAACT 

GTCCACGCAG 

GCTCAGCTAC 

CGAAAGACAA 

AACCTGTTCG 

AGCCGAATAC 

CAGGCGTGCT 

GGTTATTGGC 

CGGCAAATAC 

ACGGTTACAA 

AACGCCATTC 

GCCTGCATCG 

TCGGCGGCTA 

ATTTTGGGCG 

CACACAAGGC 

GCATCGTGTT 

AGCCTGTTCG 

ACCCGTAACC 

AAGGCCGTCT 

CTCGCCACCG 

CGCCAACCAA 

TCACGCCCGA 

GACCAAGACG 

CAAACTCTTC 

CCATCGGCGC 

ACGCTCCGCA 

CCGCCAAAAA 

ATCCGCGCGC 

TACCGCACCC 

CGCGGCGTTT 



TCAAATATTC 
GATGTTTCTG 
GACCATCACC 
CTGTTTCCGG 
ATCCCGCAGA 
CATCAAAACG 
AGATTTACGG 
AGCCGCATCG 
GGCCGATACG 
TGCGCGGCGT 
GTCAATCTGG 
CGCCGAAGCG 
GCAGCCTGAA 
GGACGCGGCG 
CTACGGCATT 
GCATGGACTA 
GCCGTGTACG 
CCCCGCCACA 
CCGGCATCGA 
GACTACACCC 
TTCCATCGAC 
ACGCCGACCC 
CGCCTGTTCG 
ATACGCCAGC 
CCAACGCCTA 
TTTGCCCAAA 
TCTCGCCACC 
GACGATACAC 
ATGACCTATG 
CGACCTGACC 
TCCCGCAATC 
GGCAACAATC 
GAACGCATCC 
CAGCAGGACG 
GCCAAAACCC 
ATGGCAGATA 
GCAGCCGCCT 
ACTGCCTACC 
AGGCGTGCGC 
TCCCCAACCC 
GCCTACGCCG 
CGAACTGTCG 
AGCCCGACCG 
ACCTATCGGT 



CCTGCTGTTT 

TTTCAGACGA 

GTTACCGCCG 

CACGCACACC 

GCGTCAGCGT 

CTCGACCGCG 

CTCCGACCGC 

CCAACTACCA 

GGCAATGCCA 

GGCGGGGCTG 

TGCGCAAACG 

GGCAACCGCA 

CACCGAAGGC 

ACTCGTGGCG 

TTGGAATACG 

CCAGCAGGCG 

ACAGCCAAGG 

AATTGGGCGA 

ACACCGCTTC 

GCAGCCGCTT 

CACAACACCG 

GCGCACCCAC 

GCCGCGAACA 

AACAAATACG 

CGAATTTTCC 

CCATCCCGCA 

CGTTTCCGCG 

CCGTTACCGC 

TGTCCGCCAA 

GGCAACCTGT 

GCAAAAAGAC 

TGGAAGCCGG 

GCCGCCGTGT 

CGACCCGAGC 

ACGGCTGGGA 

CAGGCAGGTT 

GAACCCCGAC 

ACTTTGCCCC 

TGGCAGAGCG 

CGCCGCCAAA 

TCGCCGACAT 

CTGAACGTGG 

CCACAGCTAC 

TTAAATAA 



GCCGCCCTGT 

CCCCAAACCG 

ACCGCACCGC 

CCGCTCGGGC 

CATCACATCG 

CCCTGTTGCA 

GCGGGCTACA 

AATCAACGGC 

ACACCGCCGC 

CTGGACGGCA 

CCTGACCCGC 

AACATTTCGG 

ACGCTGCGCG 

GCGGCGCGAA 

ACATCGCACC 

AAAGAAACCG 

TTATGCCACC 

ACAGCCGCCA 

AACCAAGACT 

CCGCCAGCCC 

CCGCCACCGA 

AGCGCCAGCG 

CGATTTAATC 

GCGAACGCAG 

CGCACGGGTG 

ATACGGCACC 

CCGCCGACAA 

ACCGGCAGCT 

CCGTTTCACC 

CTCTTTACGG 

GAACACGGCA 

CATCAAAGGC 

ACCGCGCCCG 

GGCAACACCT 

AATCGAAGTC 

ACAGCCAAAG 

AGCGTACCCG 

CGAAGCCCCC 

AAACCCACAC 

GCCCGCGCCG 

CATGGCGCGT 

ACAATCTGTT 

GGCGCACTGC 



TGCCCGTGTA 

CAGGAAAGCA 

GAGTTCCAAC 

TGCCCATGAC 

CAACAAATGC 

GGCGACCGGC 

ACTACCTGTT 

ATCCCCGTTG 

CTATGAGCGC 

CGGGCGAGCC 

AAGCCATTGT 

GCTGGACGCG 

GCCGCCTGGT 

CGCAGCCGCG 

GCAAACCCGC 

CCGACGCGCC 

GCCTTCGGCC 

CCGTGCGCTC 

GGAAACTCAA 

TACGGCGTAG 

CCTGATTCCC 

TGTCATTGAT 

GCGGGTATCA 

CATCATCCCC 

CCTACCCGCA 

AGGCGGCAAA 

CCTTTCGCTG 

ACGACAGCCG 

CCCTACACAG 

CTCGTACAGC 

GCTACCTGAA 

GAATGGCTTG 

TAAAAACAAC 

ACTACCGCGC 

GGCGGCCGCA 

CAAAACCCGC 

AACGCAGCTT 

AGCGGCTGGA 

CGACCCTGCC 

CCGACAACAG 

TACCGCTTCA 

CAACAAACAC 

GGACAGTGAA 



This corresponds to the amino acid sequence <SEQ ID 666; ORF23-l>: 



1 MTRFKYSLLF AALLPVYAQA DVSVSDDPKP QESTELPTIT VTADRTASSN 

51 DGYTVSGTHT PLGLPMTLRE IPQSVSVITS QQMRDQNIKT LDRALLQATG 

101 TSRQIYGSDR AGYNYLFARG SRIANYQING IPVADALADT GNANTAAYER 

151 VEWRGVAGL LDGTGEPSAT VNLVRKRLTR KPLFEVRAEA GNRKHFGLDA 

201 DVSGSLNTEG TLRGRLVSTF GRGDSWRRRE RSRDAELYGI LEYDIAPQTR 
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10 



251 
301 
351 
401 
451 
501 
551 
601 
651 
701 



VHAGMDYQQA 
NLFAGIEHRF 
GYWHADPRTH 
NAIPNAYEFS 
ILGGRYTRYR 
SLFVPQSQKD 
LATAAGRDPS 
DQDGSRLNPD 
TLRIPNPAAK 
YRTQPDRHSY 



KETADAPLSY 
NQDWKLKAEY 
SASVSLIGKY 
RTGAYPQPAS 
TGSYDSRTQG 
EHGSYLKPVT 
GNTYYRAANQ 
SVPERSFKLF 
ARAADHSRQK 
GALRTVNAAF 



AVYDSQGYAT 
DYTRSRFRQP 
RLFGREHDLI 
FAQTIPQYGT 
MTYVSANRFT 
GNNLEAGIKG 
AKTHGWEIEV 
TAYHFAPEAP 
AYAVADIMAR 
TYRFK* 



AFGPKDNPAT 
YGVAGVLSID 
AGINGYKYAS 
RRQIGGYLAT 
PYTGIVFDLT 
EWLEGRLNAS 
GGRITPEWQI 
SGWTIGAGVR 
YRFNPRAELS 



NWANSRHRAL 
HNTAATDLIP 
NKYGERSIIP 
RFRAADNLSL 
GNLSLYGSYS 
AAVYRARKNN 
QAGYSQSKTR 
WQSETHTDPA 
LNVDNLFNKH 



15 



20 



25 



30 



Computer analysis of this amino acid sequence gave the following results: 

Homology with the ferric-pseudobactin receptor PupB of Pseudomonas putida (accession number P38047) 
ORF23 and PupB protein show 32% aa identity in 205aa overlap: 

FARGSRIANYQINGIPVADALADTGNANTAAYERVEWRGVAGLLDGTGEPSATVNLVRK 65 
++RG I NY+++G+P + L D + + A ++RVE+VRG GL+ G G PSAT+NL+RK 



RLTRKPLFEVRAEAGNRKHFGLDADVSGSLNTEXXLRGRLVSTFXXXXXXXXXXXXXXAE 125 
R T + + EAGN +G DVSG L +RGR V+ + 



Orf23 


6 


PupB 


215 


0rf23 


66 


PupB 


274 


Orf23 


126 


PupB 


334 


Orf23 


184 


PupB 


392 



+YGI E+D++ T + 



D+PL 



-SQGYATAFGPKDNPATNWAN 183 
S G T N A +W+ 



+ H 



+ F IE + 



W K E 



35 



40 



45 



50 



55 



Homology with a predicted ORF from N.menineitidis (strain A) 

ORF23 shows 95.7% identity over a 21 laa overlap with an ORF (ORF23a) from strain A of N. 
meningitidis: 

10 20 30 

orf 23 . pep GYN YLFARGSRIANYQINGI PVADALADTG 

IIMIIIMIlllMlllMMIIIIIIIt 
0rf23a QMRDQNIKALDRALLQATGTSRQIYGSDRAGYNYLFARGSRIANYQINGIPVADALADTG 
90 100 110 120 130 140 

40 50 60 70 80 90 

or f 23 . pep NANTAAYERVEWRGVAGLLDGTGEPSATVNLVRKRLTRKPLFEVRAEAGNRKHFGLDAD 
lilMIIIIIIIIIMH I Mill IlillMlllil I II I I I I I I I [ I I ! I I I I I I II 
orf 23a NANTAAYERVE WRGVAGLLDGTGE PSATVNLVRKRPTRKPLFEVRAEAGNRKH FGLGAD 

150 160 170 180 190 200 

100 110 120 130 140 150 

orf 23 .pep VSGSLNTEXXLRGRLVSTEX3RGDSWRRRERSRXAELYGILEYDIAPQTRVHAXMDYQQAK 
111111:1 Mill Mill II MMIrll I I I I M I M I I I I I I 1 I I I 1 I I I I t t 1 1 I 
orf 23a V SGSLNAEGTLRGRLVS T FGRGDSWRQRERSRDAELYG I LE YD IAPQTRVHAGMDYQQAK 

210 220 230 240 250 260 

160 170 180 190 200 210 

or f 23 . pep ETADAPLSYAVYDSQGYATAFGPKDNPATNWANSHHRALNLFAGIEHRFNQDWKLKAEYD 
II Ml II III! Ill ill I I llllll 111111111:111! Mil It Mill Ml I III II I 
orf 23a ETADAPLSYAVYDSQGYATAFGPKDNPATNWANSRHRALNLFAGIEHRFNQDWKLKAEYD 
270 280 290 300 310 320 



60 



orf 23. pep 
orf23a 



YTRSRFRQPYGVAGVLSIDHNTAATDLIPGYWHADPRTHSASVSLIGKYRLFGREHDLIA 
330 340 350 360 370 380 



The complete length ORF23a nucleotide sequence <SEQ ID 667> is: 
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1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 



ATGACACGCT 
CGCGCAGGCC 
CTGAATTGCC 
GACGGCTACA 
CCTGCGCGAA 
GCGACCAAAA 
ACCAGCCGCC 
CGCGCGCGGC 
CCGACGCGCT 
GTAGAAGTCG 
TTCCGCCACC 
TTGAAGTCCG 
GACGTATCGG 
TTCCACCTTC 
ATGCCGAACT 
GTCCACGCAG 
GCTCAGCTAC 
CGAAAGACAA 
AACCTGTTCG 
AGCCGAATAC 
CAGGCGTGCT 
GGTTATTGGC 
CGGCAAATAC 
ACGGTTACAA 
AACGCCATTC 
GCCTGCATCG 
TCGGCGGCTA 
ATACTCGGCG 
CACACAAGGC 
GCATCGTGTT 
AGCCTGTTCG 
ACCCGTAACC 
AAGGCCGTCT 
CTCGCCACCG 
CGCCAACCAA 
TCACGCCCGA 
GACCAAGACG 
CAAACTCTTC 
CCATCGGCGC 
ACGCTCCGCA 
CCGCCAAAAA 
ATCCGCGCGC 
TACCGCACCC 
CGCGGCGTTT 



TCAAATATTC 
GATGTTTCTG 
GACCATCACC 
CTGTTTCCGG 
ATCCCGCAGA 
CATCAAAGCG 
AGATTTACGG 
AGCCGCATCG 
GGCCGATACG 
TGCGCGGCGT 
GTCAATCTGG 
CGCCGAAGCG 
GCAGCCTGAA 
GGACGCGGCG 
CTACGGCATT 
GCATGGACTA 
GCCGTGTACG 
CCCCGCCACA 
CCGGCATCGA 
GACTACACCC 
TTCCATCGAC 
ACGCCGACCC 
CGCCTGTTCG 
ATACGCCAGC 
CCAACGCCTA 
TTTGCCCAAA 
TCTCGCCACC 
GCAGATACAG 
ATGACCTATG 
CGACCTGACC 
TCCCGCAATC 
GGCAACAATC 
GAACGCATCC 
CAGCAGGACG 
GCCAAAACCC 
ATGGCAGATA 
GCAGCCGCCT 
ACTGCCTACC 
AGGCGTGCGC 
TCCCCAACCC 
GCCTACGCCG 
CGAACTGTCG 
AGCCCGACCG 
ACCTATCGGT 



CCTGCTGTTT 
TTTCAGACGA 
GTTACCGCCG 
CACGCACACC 
GCGTCAGCGT 
CTCGACCGCG 
CTCCGACCGC 
CCAACTACCA 
GGCAATGCCA 
GGCGGGGCTG 
TGCGCAAACG 
GGCAACCGCA 
TGCCGAAGGC 
ACTCGTGGCG 
TTGGAATACG 
CCAGCAGGCG 
ACAGCCAAGG 
AATTGGGCGA 
ACACCGCTTC 
GCAGCCGCTT 
CACAACACCG 
GCGCACCCAC 
GCCGCGAACA 
AACAAATACG 
CGAATTTTCC 
CCATCCCGCA 
CGTTTCCGCG 
CCGTTACCGC 
TGTCCGCCAA 
GGCAACCTGT 
GCAAAAAGAC 
TGGAAGCCGG 
GCCGCCGTGT 
CGACCCGAGC 
ACGGCTGGGA 
CAGGCAGGTT 
GAACCCCGAC 
ACTTTGCCCC 
TGGCAGAGCG 
CGCCGCCAAA 
TCGCCGACAT 
CTGAACGTGG 
CCACAGCTAC 
TTAAATAA 



GCCGCCCTGT 
CCCAAAACCG 
ACCGCACCGC 
CCGCTCGGGC 
CATCACATCG 
CCCTGTTGCA 
GCGGGCTACA 
AATCAACGGC 
ACACCGCCGC 
CTGGACGGCA 
CCCGACCCGC 
AACATTTCGG 
ACGCTGCGCG 
GCAGCGCGAA 
ACATCGCACC 
AAAGAAACCG 
TTATGCCACC 
ACAGCCGCCA 
AACCAAGACT 
CCGCCAGCCC 
CCGCCACCGA 
AGCGCCAGCG 
CGATTTAATC 
GCGAACGCAG 
CGCACGGGTG 
ATACGGCACC 
CCGCCGACAA 
ACCGGCAGCT 
CCGTTTCACC 
CGCTTTACGG 
GAACACGGCA 
CATCAAAGGC 
ACCGCGCCCG 
GGCAACACCT 
AATCGAAGTC 
ACAGCCAAAG 
AGCGTACCCG 
CGAAGCCCCC 
AAACCCACAC 
GCCCGCGCCG 
CATGGCGCGT 
ACAATCTGTT 
GGCGCACTGC 



TGCCCGTGTA 
CAGGAAAGCA 
GAGTTCCAAC 
TGCCCATGAC 
CAACAAATGC 
GGCGACCGGC 
ACTACCTGTT 
ATCCCCGTTG 
CTATGAGCGC 
CGGGCGAGCC 
AAGCCATTGT 
GCTGGGCGCG 
GCCGCCTGGT 
CGCAGCCGCG 
GCAAACCCGC 
CCGACGCGCC 
GCCTTCGGCC 
CCGTGCGCTC 
GGAAACTCAA 
TACGGCGTAG 
CCTGATTCCC 
TGTCATTAAT 
GCGGGTATCA 
CATCATCCCC 
CCTACCCGCA 
AGGCGGCAAA 
CCTTTCGCTG 
ACGACAGCCG 
CCCTACACAG 
CTCGTACAGC 
GCTACCTGAA 
GAATGGCTTG 
TAAAAACAAC 
ACTACCGCGC 
GGCGGCCGCA 
CAAAACCCGC 
AACGCAGCTT 
AGCGGCTGGA 
CGACCCTGCC 
CCGACAACAG 
TACCGCTTCA 
CAACAAACAC 
GGACAGTGAA 



This encodes a protein having amino acid sequence <SEQ ID 668>: 



1 MTRFKYSLLF AALLPVYAQA DVSVSDDPKP QESTELPTIT VTADRTASSN 

51 DGYTVSGTHT PLGLPMTLRE IPQSVSVITS QQMRDQNIKA LDRALLQATG 

101 TSRQIYGSDR AGYNYLFARG SRIANYQING IPVADALADT GNANTAAYER 

151 VEWRGVAGL LDGTGEPSAT VNLVRKRPTR KPLFEVRAEA GNRKHFGLGA 

201 DVSGSLNAEG TLRGRLVSTF GRGDSWRQRE RSRDAELYGI LEYDIAPQTR 

251 VHAGMDYQQA KETADAPLSY AVYDSQGYAT AFGPKDNPAT NWANSRHRAL 

301 NLFAGIEHRF NQDWKLKAEY DYTRSRFRQP YGVAGVLSID HNTAATDLIP 

351 GYWHADPRTH SASVSLIGKY RLFGREHDLI AGINGYKYAS NKYGERSIIP 

401 NAIPNAYEFS RTGAYPQPAS FAQTIPQYGT RRQIGGYLAT RFRAADNLSL 

451 ILGGRYSRYR TGSYDSRTQG MTYVSANRFT PYTGIVFDLT GNLSLYGSYS 

501 SLFVPQSQKD EHGSYLKPVT GNNLEAGIKG EWLEGRLNAS AAVYRARKNN 

551 LATAAGRDPS GNTYYRAANQ AKTHGWEIEV GGRITPEWQI QAGYSQSKTR 

601 DQDGSRLNPD SVPERSFKLF TAYHFAPEAP SGWTIGAGVR WQSETHTDPA 

651 TLRI PNPAAK ARAADNSRQK AYAVADIMAR YRFNPRAELS LNVDNLFNKH 

701 YRTQPDRHSY GALRTVNAAF TYRFK* 

ORF23a and ORF23-1 show 99.2% identity in 725 aa overlap: 



10 20 30 40 50 60 

or f 23a . pep MTRFKYSLLFAALLPVYAQADVSVSDDPKPQESTELPTITVTADRTASSNDGYTVSGTHT 
i 1 1 I 1 I I 1 1 I I 1 1 I I I t 1 I I I 1 1 1 ! 1 I 1 1 1 I ! I I I I I I i I 1 1 I f 1 1 1 1 1 1 1 1 1 1 1 t I I I I 
orf23-l MTRFKYSLLFAALLPVYAQADVSVSDDPKPQE STELPTITVTADRTAS SNDGYTVSGTHT 

10 20 30 40 50 60 
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70 80 90 100 110 120 

PLGLPMTLREIPQSVSVITSQQMRDQNIKALDRALLQATGTSRQIYGSDRAGYNYLFARG 

MMMMMMMMMMMMMMiMMMMMMMMMMMMMMMI 

PLGLPMTLREIPQSVSVITSQQMRDQNIKTLDRALLQATGTSRQIYGSDRAGYNYLFARG 
70 80 90 100 110 120 

130 140 150 160 170 180 

SR I AN YQING I PVADALADTGNANTAAYERVEVVRGVAGLLDGTGEPSAT VNLVRKRPTR 

iliiiiiiiiliitiiiiiiiiiiiiiilllitlllllliililillliilliilii li 

SRIAN YQING I PVADALADTGNANTAAYERVE WRGVAGLLDGTGE PS ATVNLVRKRLTR 
130 140 150 160 170 180 

190 200 210 220 230 240 

KPLFEVRAEAGNRKHFGLGADVSGSLNAEGTLRGRLVSTFGRGDSWRQRERSRDAELYGI 
I I I I I I I I I I I I I I I I II I I I I III I : I I I! I ! I I I M I I I I I I I I : I M I M I I I I I I 
KPLFEVRAEAGNRKHFGLDADVSGSLNTEGTLRGRLVSTFGRGDSWRRRERSRDAELYGI 

190 200 210 220 230 240 

250 260 270 280 290 300 

LEYDIAPQTRVHAGMDYQQAKETADAPLSYAVYDSQGYATAFGPKDNPATNWANSRHRAL 
I I I I I I I I I I I I II I I II I I II I I I II I I I M I I I I I I I I I I I I I I I I I II I I I I I I I I I 
LEYDI APQTRVHAGMDYQQAKETADAPLS YAVYDSQGYATAFGPKDN PATNWAN SRHRAL 

250 260 270 280 290 300 

310 320 330 340 350. 360 

NLFAGIEHRFNQDWKLKAEYDYTRSRFRQPYGVAGVLS I DHNTAATDLI PGYWHADPRTH 
I I M I II I I I I I I I I I I II I I M I I I I M I I I I I I I I I II i I I I M i I I 1 1 I Ml I II I I 
NLFAGIEHRFNQDWKLKAEYDYTRSRFRQPYGVAGVLS I DHNTAATDLI PGYWHADPRTH 

310 320 330 340 350 360 

370 380 390 400 410 420 

SASVSLIGKYRLFGREHDLIAGINGYKYASNKYGERSIIPNAIPNAYEFSRTGAYPQPAS 
I I I I I I I I I I I I II I I I I I I I I I I I I I I I I II I I II I I II I I I I I I I I I I I I I I I I I I I I 
SASVSLIGKYRLFGREHDLIAGINGYKYASNKYGERSIIPNAIPNAYEFSRTGAYPQPAS 

370 380 390 400 410 420 

430 440 450 460 470 480 

FAQTIPQYGTRRQIGGYLATRFRAADNLSLILGGRYSRYRTGSYDSRTQGMTYVSANRFT 
I I II II I I I I Ml II I! ! II III M I I I III M M 1: I I M MM 1 t ! IIIIIIMIIII 
FAQTIPQYGTRRQIGGYLATRFRAADNLSLILGGRYTRYRTGSYDSRTQGMTYVSANRFT 

430 440 450 460 470 480 

490 500 510 520 530 540 

PYTGIVFDLTGNLSLYGSYSSLFVPQSQKDEHGSYLKPVTGNNLEAGIKGEWLEGRLNAS 
I I I I I I I I I I I I I 1 I I I I 1 I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 
PYTGIVFDLTGNLSLYGSYSSLFVPQSQKDEHGSYLKPVTGNNLEAGIKGEWLEGRLNAS 

490 500 510 520 530 540 

550 560 570 580 590 600 

AAVYRARKNNLATAAGRDPSGNTYYRAANQAKTHGWEIEVGGRITPEWQIQAGYSQSKTR 
I I I I I I I I I I I I I I II I II I I I i I I I I I I I I I I I I I I I I I I I I I I I II I I M I I I I I I I I I 
AAVYRARKNNLATAAGRDPSGNTYYRAANQAKTHGWEIEVGGRITPEWQIQAGYSQSKTR 

550 560 570 580 590 600 

610 620 630 640 650 660 

DQDGSRLNPDSVPERSFKLFTAYHFAPEAPSGWTIGAGVRWQSETHTDPATLRIPNPAAK 
I I II II M II II II M I II II II I I I II II I I I I I I I I 1 I M I M I I I I II I II M I I II 
DQDGSRLNPDSVPERSFKLFTAYHFAPEAPSGWTIGAGVRWQSETHTDPATLRIPNPAAK 

610 620 630 640 650 660 

670 680 690 700 710 720 

ARAADNSRQKAYAVADIMARYRFNPRAELSLNVDNLFNKHYRTQPDRHSYGALRTVNAAF 
I I I I II I I I I M I I I 1 I I I I I I I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I I II I II 
ARAADNSRQECAYAVADIMARYRFNPRAELSI^DNLFNKHYRTQPDRHSYGALRTVNAAF 

670 680 690 700 710 720 



TYRFKX 

MUM 
TYRFKX 
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Homologv with a predicted ORF from A Gonorrhoeae 

ORF23 shows 93.4% identity over a 21 laa overlap with a predicted ORF (ORF23.ng) from N. 
gonorrhoeae: 



orf23.pep 
orf23ng 
orf23.pep 
orf23ng 
orf23 .pep 
orf23ng 
orf23 .pep 
orf23ng 



GYNYLFARGSRI AN YQ ING I PVADALADTGNANTAAYERVE WRGVAGLLD 51 
illMIIMitlliiiiltllltMllilllllMllillillMIMI I 

SAVDACRI PG YNYLFARGSRIAN YQ ING I PVADALADTGNANTAAYERVE WRGVAGL PD 60 

GTGEPSATVNLVRKRLTRKPLFEVRAEAGNRKHFGLDADVSGSLNTEXXLRGRLVSTFGR 111 
IIIIIIMtillii: I I It I M I I M I I M 1 M I K I I I 1 1 1 1 I r | : I I I I I I I I j I I 

GTGEPSATVNLVRKHPTRKPLFEVRAEAGNRKHFGLGADVSGSLNAEGTLRGRLVSTFGR 120 

GDSWRRRERSRXAELYGILEYDI APQTRVHAXMDYQQAKETADAPLSYAVYDSQGYATAF 171 
Mill: MM MIMMMMMMMM M M M M M M M M M M M M M M 

GDSWRQLERSRDAELYGILEYDIAPQTRVHAGMDYQQAKETADAPLSYAVYDSQGYATAF 180 

GPKDN PATNWAN SHHRALNLFAG IEHRFNQDWKLKAE YDY 211 
IN M MMI:M:: MMM MMM M MIM t MM I 

GPKDNPATNWSNSRNRALNLFAGIEHRFNQDWKLKAEYDYTRSRFRQPYGVAGVLSIDHS 240 



The ORF23ng nucleotide sequence <SEQ ID 669> is predicted to encode a protein comprising 
amino acid sequence <SEQ ID 670>: 



1 SAVDACRI PG YNYLFARGSR IANYQINGIP VADALADTGN ANTAAYERVE 

51 WRGVAGLPD GTGEPSATVN LVRKHPTRKP LFEVRAEAGN RKHFGLGADV 

101 SGSLNAEGTL RGRLVSTFGR GDSWRQLERS RDAELYGILE YDIAPQTRVH 

151 AGMDYQQAKE TADAPLSYAV YDSQGYATAF GPKDN PATNW SNSRNRALNL 

201 FAGIEHRFNQ DWKLKAEYDY TRSRFRQPYG VAGVLSIDHS TAATDLIPGY 

251 WHADPRTHSA SMSLTGKYRL FGREHDLIAG INGYKYASNK YGERSIIPNA 

301 IPNAYEFSRT GAYPQPSSFA QTIPQYDTRR QIGGYLATRF RAADNLSLIL 

351 GGRYSRYRAG SYNSRTQGMT YVSANRFTPY TGIVFDLTGN LSLYGSYSSL 

401 FVPQLQKDEH GSYLKPVTGN NLEADIKGEW LEGRLNASAA VYRARKNNLA 

451 TAAGRDQSGN TYYRAANQAK THGWEIEVGG RITPEWQIQA GYSQSKPRDQ 

501 DGSRLNPDSV PERSFKLFTA YHLAPEAPSG RTIGAGVRRQ GETHTDPAAL 

551 RIPNPAAKAR AVANSRQKAY AVADIMARYR FNPRTELSLN VDNLFNKHYR 

601 TQPDRHSYGA LRTVNAAFTY RFK* 

Further woric revealed the complete nucleotide sequence <SEQ ID 67 1>: 



1 ATGACACGCT TCAAATACTC 

51 CGCGCAGGCC GATGTTTCTG 

101 CCGAATTGCC GACCATCACC 

151 GACGGCTACA CCGTTTCCGG 

201 CCTGCGCGAA ATCCCGCAGA 

251 GCGACCAAAA CATCAAAACG 

301 ACCAGCCGCC AGATTTACGG 

351 CGCGCGCGGC AGCCGCATCG 

401 CCGACGCGCT GGCCGATACG 

451 GTAGAAGTCG TGCGCGGCGT 

501 TTCTGCCACC GTCAATCTGG 

551 TTGAAGTCCG CGCCGAAGCC 

601 GACGTATCGG GCAGCCTGAA 

651 TTCCACCTTC GGACGCGGCG 

701 ATGCCGAACT CTACGGCATT 

751 GTCCACGCAG GCATGGACTA 

801 GCTCAGCTAC GCCGTGTACG 

851 CAAAAGACAA CCCCGCCACA 

901 AACCTGTTCG CCGGCATAGA 

951 AGCCGAATAC GACTACACCC 

1001 CAGGCGTACT TTCCATCGAC 

1051 GGTTATTGGC ACGCcgatcc 

1101 CGGCAAATAC CgcctGTTCG 

1151 ACGGCTACAA ATACGCCAGC 

1201 AACGCCATTC CCAACGCCTA 

1251 GCCATCATCG TTTGCCCAAA 

1301 TCGGCGGCTA TCTCGCCACC 

1351 ATACTCGGCG GCAGATACAG 



CCTGCTTTTT GCCGCCCTGC TACCCGTGTA 
TTTCAGACGA CCCCAAACCG CAGGAAAGCA 
GTTACCGCCG ACCGCACCGC GAGTTCCAAC 
CACGCACACC CCGTTCGGGC TGCCCATGAC 
GCGTCAGCGT CATCACATCG CAACAAATGC 
CTCGACCGCG CCCTGTTGCA GGCGACCGGC 
CTCCGACCGC GCGGGCTACA ACTACCTGTT 
CCAACTACCA AATCAACGGC ATCCCCGTTG 
GGCAATGCCA ACACCGCCGC CTATGAGCGC 
GGCGGGGCTG CCGGACGGCA CGGGCGAGCC 
TACGCAAACA CCCGACCCGC AAGCCATTGT 
GGCAACCGCA AACATTTCGG GCTGGGCGCG 
CGCCGAAGGC ACGCTGCGCG GCCGCCTGGT 
ACTCGTGGCG GCAGCTCGAA CGCAGCCGCG 
TTGGAATACG ACATCGCACC GCAAACCCGC 
CCAGCAGGCG AAAGAAACCG CAGACGCGCC 
ACAGCCAAGG TTATGCCACC GCCTTCGGCC 
AATTGGTCGA ACAGCCGCAA CCGTGCGCTC 
ACACCGCTTC AACCAAGACT GGAAACTCAA 
GTAGCCGCTT CCGCCAGCCC TACGGTGTGG 
CACAGCACTG CCGCCACCGA CCTGATTCCC 
GCGCACCCAC AGCGCCAGCA TGTCATTGAC 
GCCGCGAGCA CGATTTAATC GCGGGTATCA 
AACAAATACG GCGAACGCAG CATCATTCCC 
CGAATTTTCC CGCACGGGCG CCTATCCGCA 
CCATCCCGCA ATACGACACC AGGCGGCAAA 
CGTTTCCGCG CCGCCGACAA CCTTTCGCTG 
CCGCTACCGC GCAGGCAGCT ACAACAGCCG 
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1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 



CACACAAGGC 
GCATCGTGTT 
AGCCTGTTCG 
ACCCGTAACC 
AAGGGCGTCT 
CTCGCCACCG 
CGCCAACCAA 
TCACGCCCGA 
GACCAAGACG 
CAAACTCTTC 
CCATcggTGC 
GCGCTCCGCA 
CCGCCAGAAA 
ATCCGCGCAC 
TACCGCACCC 
CGCGGCGTTT 



ATGACCTATG 
CGATCTGACC 
TCCCGCAATT 
GGCAACAATC 
GAACGCATCC 
CAGCAGGACG 
GCCAAAACCC 
ATGGCAGATA 
GCAGCCGCCT 
ACCGCCTACC 
GGGTGTGCGC 
TCCCCAACCC 
GCCTACGCCG 
CGAACTGTCG 
AGCCCGACCG 
ACCTATCGGT 



TGTCCGCCAA 
GGCAACCTGT 
GCAAAAAGAC 
TGGAAGCCGA 
GCCGCCGTGT 
CGACCAGAGC 
ACGGCTGGGA 
CAGGCAGGCT 
GAACCCCGAC 
ACTTAGCCCC 
CGGCAGGGCG 
CGCCGCCAAA 
TCGCCGACAT 
CTGAACGTGG 
CCACAGCTAC 
TTAAATAA 



CCGTTTCACC 
CGCTTTACGG 
GAACACGGCA 
CATCAAAGGC 
ACCGCGCCCG 
GGCAACACCT 
AATCGAAGTC 
ACAGCCAAAG 
AGCGTAcCCG 
CGAAGCCCCC 
AAACCCACAC 
GCCCGCGCCG 
CATGGCGCGT 
ACAACCTGTT 
GGCGCACTGC 



CCCTACACAG 
CTCGTACAGC 
GCTACCTGAA 
GAATGGCTTG 
TAAAAACAAC 
ACTATCGCGC 
GGCGGCCGCA 
CAAACCCCGC 
AACGCAGCTT 
AGCGGCCGGA 
CGACCCAGCC 
TCGCCAACAG 
TACCGCTTCA 
CAACAAACAC 
GGACAGTGAA 



This corresponds to the amino acid sequence <SEQ ID 672; ORF23ng-l>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 



MTRFKYSLLF AALLPVYAQA 



DGYTVSGTHT 
TSRQIYGSDR 
VEWRGVAGL 
DVSGSLNAEG 
VHAGMDYQQA 
NLFAGIEHRF 
GYWHADPRTH 
NAI PNAYEFS 
ILGGRYSRYR 
SLFVPQLQKD 
LATAAGRDQS 
DQDGSRLNPD 
ALRIPNPAAK 
YRTQPDRHSY 



PFGLPMTLRE 
AGYNYLFARG 
PDGTGEPSAT 
TLRGRLVSTF 
KETADAPLSY 
NQDWKLKAEY 
SASMSLTGKY 
RTGAYPQPSS 
AGSYNSRTQG 
EHGSYLKPVT 
GNTYYRAANQ 
SVPERSFKLF 
ARAVANSRQK 
GALRTVNAAF 



DVSVSDDPKP 
IPQSVSVITS 
SRIANYQING 
VNLVRKHPTR 
GRGDSWRQLE 
AVYDSQGYAT 
DYTRSRFRQP 
RLFGREHDLI 
FAQTIPQYDT 
MTYVSANRFT 
GNNLEADIKG 
AKTHGWEIEV 
TAYHLAPEAP- 
AYAVADIMAR 
TYRFK* 



QESTELPTIT 
QQMRDQNIKT 
IPVADALADT 
KPLFEVRAEA 
RSRDAELYGI 
AFGPKDNPAT 
YGVAGVLSID 
AGINGYKYAS 
RRQIGGYLAT 
PYTGIVFDLT 
EWLEGRLNAS 
GGRITPEWQI 
SGRTIGAGVR 
YRFNPRTELS 



VTADRTASSN 

LDRALLQATG 

GNANTAAYER 

GNRKHFGLGA 

LEYDIAPQTR 

NWSNSRNRAL . 

HSTAATDLIP 

NKYGERSIIP 

RFRAADNLSL 

GNLSLYGSYS 

AAVYRARKNN 

QAGYSQSKPR 

RQGETHTDPA 

LNVDNLFNKH 



ORF23ng-l and ORF23-1 show 95.9% identity in 725 aa overlap: 



10 20 30 40 50 60 

or f 23-1 . pep MTRFKYSLLFAALLPVYAQADVSVSDDPKPQESTELPTITVTADRTASSNDGYTVSGTHT 
I M I I I I I I I I I I I I I I I I I I I I I I I II I! II I I I I I I I I I I I I I I I I I | | | | M | | I | | 
orf23ng-l MTRFKYS LLFAALLPVY AQADVS VS DD PKPQESTEL PT IT VT ADRT AS SN DGYTVSGTHT 

10 20 30 40 50 60 

70 80 90 100 110 120 

or f 23-1 - pep PLGLPMTLREIPQSVSVITSQQMRDQNIKTLDRALLQATGTSRQIYGSDRAGYNYLFARG 
I: I 1 I I I I I I I M I I I I I I I I I I I I I I I III I I I I I I I I I I I I I I I II I I I I I I I I I I I I 
orf23ng-l PFGLPMTLRE I PQSVSVITSQQMRDQNIKTLDRALLQATGTSRQ I YGS DRAG YNYLFARG 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 23-1. pep SRIANYQING I PVADALADTGNANTAAYERVEWRGVAGLLDGTGE PS ATVNLVRKRLTR 

I M I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I M I I I I : II 
orf23ng-l SRIANYQING I PVADALADTGNANTAAYERVE WRGVAGLPDGTGE PS ATVNLVRKHPTR 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 23-1 . pep KPLFEVRAEAGNRKHFGLDADVSGSLNTEGTLRGRLVSTFGRGDSWRRRERSRDAELYGI 

II I I I II II II I II I I I I II Ml I I 1:1 Mill III MM I I I III: inn HUM 
orf23ng-l KPLFEVRAEAGNRKHFGLGADVSGSLNAEGTLRGRLVSTFGRGDSWRQLERSRDAELYGI 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 23-1 . pep LEYDIAPQTRVHAGMDYQQAKETADAPLSYAVYDSQGYATAFGPKDNPATNWANSRHRAL 
1 1 1 1 1 1 1 1 1 II I I 1 1 M 1 1 1 1 I 1 1 I I 1 1 II I 1 1 I I M 1 1 1 I 1 1 I II 1 1 1 1 1 I : I 1 1 : 1 I I 
orf23ng-l LE YDI APQTRVHAGMDYQQAKETADAPLS YAVYD SQGYATAFG PKDNPATNWSN SRNRAL 

250 260 270 280 290 300 

310 320 330 340 350 360 

or f 2 3- 1 . pep NLFAG IEHRFNQDWKLKAEYDYTRSRFRQPYGVAGVLS I DHNTAATDLI PGYWHADPRTH 
I M I II I I ! I! I II I I I I I | I | | | | | | I | I I | I I | I i | | | | : M | | | | | n | | | nun 
orf23ng-l NL FAG IEHRFNQDWKLKAEYDYTRSRFRQPYGVAGVLS I DHSTAATDL I PGYWHADPRTH 
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310 



320 



330 



340 



350 



360 



370 380 390 400 410 420 

orf 23-1 . pep SASVSLIGKYRLFGREHDLIAGINGYKYASNKYGERSIIPNAIPNAYEFSRTGAYPQPAS 
111:11 I I I I I I I I I I I I II I I I i I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I : I 
orf23ng-l S ASMS LTGKYRL FGREHDLI AG ING YKY ASNKYGERS 1 1 PN AI PN AYE FSRTGAY PQPS S 

370 380 390 400 410 420 



10 



15 



20 



25 



430 440 450 460 470 480 

orf 23-1. pep FAQTIPQYGTRRQIGGYLATRFRAADNLSLILGGRYTRYRTGSYDSRTQGMTYVSANRFT 

If Iflllli IN till I I illl! I IIIMrl! 1:1 11:1 Mill I Ml MM I 

orf23ng-l FAQTIPQYDTRRQIGGYLATRFRAADNLSLILGGRYSRYRAGSYNSRTQGMTYVSANRFT 

430 440 450 460 470 480 

490 500 510 520 530 540 

or f 23-1 . pep PYTGIVFDLTGNLSLYGSYSSLFVPQSQKDEHGSYLKPVTGNNLEAGIKGEWLEGRLNAS 
I I I I I I I I I I I II I I I I I 1 I I I I I I I II I I I I I I I I II I I I I I I I II I I I I I I I I I I I 
orf23ng-l PYTGIVFDLTGNLSLYGSYSSLFVPQLQKDEHGSYLKPVTGNNLEADIKGEWLEGRLNAS 

490 500 510 520 530 540 

550 560 570 580 590 600 

or f 23-1 . pep AAVYRARKNNLATAAGRDPSGNTYYRAANQAKTHGWEIEVGGRITPEWQIQAGYSQSKTR 
1 I I I I I I M I I I I M I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I 
orf23ng-l AAVYRARKNNLATAAGRDQSGNTYYRAANQAKTHGWEIEVGGRITPEWQIQAGYSQSKPR 

550 560 570 580 590 600 



30 



35 



610 620 630 640 650 660 

orf 23-1. pep DQDGSRLNPDSVPERSFKLFTAYHFAPEAPSGWTIGAGVRWQSETHTDPATLRIPNPAAK 
I I I I I I I I I I I II I I II M I I I I I : M I II I I I I I I I I I I : I I M M I : I I I I I I I I I 
orf23ng-l DQDGSRLNPDSVPERS FKLFTAYHLAPEAPSGRTIGAGVRRQGETHTDPAALRIPNPAAK 

610 620 630 640 650 660 

670 680 690 700 710 720 

or f 23-1 . pep ARAADNSRQKAYAVADIMARYRFNPRAELSLNVDNLFNKHYRTQPDRHSYGALRTVNAAF 
Ml: M M I II I M I M II I I M II : I I I I II M I I I I I I I I I II I I I I I M I I I I I II 
orf23ng-l ARAVANSRQKAYAVADIMARYRFNPRTELSLNVDNLFNKHYRTQPDRHSYGALRTVNAAF 

670 680 690 700 710 720 



40 



orf 23-1. pep 



TYRFKX 

Mini 

TYRFKX 



45 



50 



orf23ng-l 

In addition, OKF23ng-l shows significant homology with an OMP from Exolt 

sp|P16869|FHUE_ECOLI OUTER-MEMBRANE RECEPTOR FOR FE ( I I I ) -COPROGEN , FE(III)- 
FERRIOXAMINE B AND FE (III) -RHODOTRULIC ACID PRECURSOR >gi 1 1651542 | gnl | PID j dl015403 
(D907 45) Outer membrane protein FhuE precursor [Escherichia coli] 
>gi|1651545|gnl|PID|dl015405 (D90746) Outer membrane protein FhuE precursor 
[Escherichia coli] >gi 11787344 (AE000210) outer-membrane receptor for Fe(III)- 
coprogen, Fe (III) -ferrioxamine B and Fe (III) -rhodotrulic acid precursor 
[Escherichia coli) Length =729 
Score = 332 bits (843), Expect - 3e-90 

Identities = 228/717 (31%), Positives = 350/717 (48%), Gaps - 60/717 (8%) 

T ITVT ADRT AS SN — DGYTVSGTHTPFGLPMTLREIPQSVSVITSQQMRDQNIKTLDRAL 95 
T+ V TA + + Y+V+ T + MT R+IPQSV++++ Q+M DQ ++TL + 

TVIVEGSATAPDDGENDYSVTSTSAGTKMQMTQRDIPQSVTIVSQQRMEDQQLQTLGEVM 102 



55 


Query: 


38 




Sbjct: 


43 




Query: 


96 


60 


Sbjct: 


103 




Query: 


148 


65 


Sbjct: 


155 




Query: 


207 




Sbjct: 


215 


70 


Query: 


267 



LQATGTSRQIYGSDRAGYNYLFARGSRIANYQINGIP- 
G S+ SDRA Y ++RG +1 NY ++GIP 



-VADALADTGNANTAA 147 

+ DAL+D A 
SLGDALSDM AL 154 



+ERVEWRG GL GTG PSA +N+VRKH T + 



+V AE G+ 



AD+ 



+G +R R+V + 



DSW 



GI++ D+ T + AG +YQ+ 



PLSYAVYDSQGYATAFGPKDNPATNWSNSRNRALNLFAGIEHRFNQDWKLKAEYDYTRSR 32 6 



WO 99/24578 



-386- 



PCT/IB98/01665 



+++ G + ++ + A +W+ + +F ++ +F W+ ++ 

Sbjct: 275 WGGL PRWNTDGS SNS YDRARSTAP DWAYNDKE INKVETfTLKQQFADTWQAT LNATHSEVE 334 

Query: 327 F — RQPYGVAGVLS I DHSTAA — TDLI PGY WHADPRTHSA-SMSLTGKYRLFG 374 

5 F + YAVD ++ PG+ W++ R A + G Y LFG 

Sbjct: 335 FDSKMMYVDAYVNKADGMLVGPYSNYGPGFDYVGGTGWNSGKRKVDALDLFADGSYELFG 394 

Query: 375 REHDLIAGINGYKYASNKYGER — SIIPNAIPNAYEFSRTGAYPQPSSFAQTIPQYDTRR 432 
R+H+L+ G Y +N+Y +1 P+ I + Y F+ G +PQ Q++ Q DT 

10 Sbjct: 395 RQHNLMFG-GSYSKQNNRYFSSWANIFPDEIGSFYNFN — GNFPQTDWSPQSLAQDDTTH 451 

Query: 433 QIGGYLATRFRAADNLSLILGGRYSRYRAGSYNSRTQGMTY-VSANRFTPYTGIVFDXXX 4 91 

Y ATR AD L LILG RY+ +R + +TY + N TPY G+VFD 

Sbjct: 452 MKS L Y AATRVTLADPLHL I LG AR YTNWRV DT LTYSMEKNHTT PYAGLVFDIND 504 

Query: 492 XXXXXXXXXXXFVPQLQKDEHGSYLKPVTGNNI£ADIKGEWLEGRLNASAAVYRARKNNL 551 

F PQ +D G YL P+TGNN E +K +W+ RL + A++R ++N+ 
Sbjct: 505 NWSTYASYTSIFQPQNDRDSSGKYLAPITGNNYELGLKSDWMNSRLTTTLAIFRIEQDNV 564 

20 Query: 552 ATAAGR— DQSGNTYYRAANQAKTHGWEIEVGGRITPEWQIQAGYSQSKPRDQDGSRLN 608 

A + G +G T Y+A + + G E E+ G IT WQ+ G ++ D +G+ +N 

Sbjct: 565 AQS TGT P I PGSNGETAYKAVDGTVS KGVE FELNGAIT DNWQLT FGATRYI AEDNEGNAVN 624 

Query: 609 PDSVPERSFKLFTAYHLAPEAPSGRT IGAGVRRQGETHTDPAALRI PNPAAKARAVANSR 668 
25 P ++P + K+FT+Y L P P T+G GV Q +TD P RA 

Sbjct: 625 P-NLPRTTVKMFTSYRL-PVMPE-LTVGGGVNWQNRVYTDTV TPYGTFRA-- f — E 672 

Query: 669 QKAYAVADIMARYRFNPRTELSLNVDNLFNKHYRTQPDRH-SYGALRTVNAAFTYRF 724 . 

Q +YA+ D+ RY+ L NV+NLF+K Y T + YG R + TY+F 

30 Sbjct: 673 QGSYALVDLFTRYQVTKNFSLQGNVNNLFDKTYDTNVEGSIVYGTPRNFSITGTYQF 729 

i 

Based on this analysis, it was predicted that these proteins from N.meningitidis and N.gonorrhoe^e, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF23-1 (77.5kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
35 above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
15A shows the results of affinity purification of the His-fiision protein, and Figure 15B shows the 
results of expression of the GST-fusion in E.coli. Purified His-fusion protein was used to immunise 
mice, whose sera were used for Western blot (Figure 15C) and for ELISA (positive result). These 
experiments confirm that ORF23-1 is a surface-exposed protein, and that it is a useful immunogen. 

40 Example 80 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 673>: 

1 ATGCGCACGG CAGTGGTTTT GCTGTTGATC ATGCCGATGG CGGCTTCGTC 

51 GGCAATGATG CCGGAAATGG TGTGCGCGGG CGTGTCGCCG GGAACGGCAA 

101 TCATATCCAA GCCGACCGAA CAAACGGCGG TCATGGCTTC GAGTTTGTCC 

45 151 AGCGTCAgcA CGCCTGCTTC GGCGgcGgCa ATCATACCTT CGTCTTCGGA 

201 AACGGGGATA AACGcGCCAC TCAAACCCCC GACCGCGCTG GAAGCCATCA 

251 TGCCGCCTTT TTTCACGGCA TCGTTCAGCA ATGCCAAAGC TGCTGTTGTG 

301 CCGTGCGTAC CGCAGACGCT CAAGCCCATT TnTTCAAGAA TGCGTGCCAC 

351 TnAGTCGCCG ACGGGG. . 

50 This corresponds to the amino acid sequence <SEQ ID 674; ORF24>: 

1 MRTAWLLLI MPMAASSAMM PEMVCAGVSP GTAIISKPTE QTAVMASSLS 
51 SVSTPASAAA IIPSSSETGI NAPLKPPTAL EAIMPPFFTA SFSNAKAAW 
101 PCVPQTLKPI XSRMRATXSP TG. . 
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Further work revealed the complete nucleotide sequence <SEQ ID 675>: 

1 ATGCGCACGG CAGTGGTTTT GCTGTTGATC ATGCCGATGG CGGCTTCGTC 

51 GGCAATGATG CCGGAAATGG TGTGCGCGGG CGTGTCGCCG GGAACGGCAA 

101 TCATATCCAA GCCGACCGAA CAAACGGCGG TCATGGCTTC GAGTTTGTCC 

151 AGCGTCAGCA CGCCTGCTTC GGCGGCGGCA ATCATACCTT CGTCTTCGGA 

201 AACGGGGATA AACGCGCCAC TCAAACCCCC GACCGCGCTG GAAGCCATCA 

251 TGCCGCCTTT TTTCACGGCA TCGTTCAGCA ATGCCAAAGC TGCTGTTGTG 

301 CCGTGCGTAC CGCAGACGCT CAAGCCCATT TCTTCAAGAA TGCGTGCCAC 

351 TGAGTCGCCG ACGGCGGGGG TCGGCGCCAG CGACAAGTCG AGAATACCAA 

401 ACGGGATATT CAGCATTTTT GAGGCTTCGC GGCCGATGAG TTCGCCCACG 

451 CGGGTAATTT TGAAAGCAGT TTTCTTCACT ACTTCCGCAA CTTCGGTCAA 

501 TGTCGTTGCA TCTGAATTTT CCAACGCGGC TTTTACGACA CCTGGGCCGG 

551 ATACGCCGAC ATTGATAACG GCATCCGCTT CGCCCGAACC ATGAAACGCG 

601 CCCGCCATAA ACGGGTTGTC TTCCACCGCG TTGCAGAACA CGACAATTTT 

651 AGCGCAGCCG AAACCTTCGG GCGTGATTTC CGCCGTGCGT TTGACGGTTT 

701 CGCCCGCCAG CTTGACCGCA TCCATATTGA TACCGGCACG CGTACTGCCG 

751 ATATTGATGG AGCTGCACAC AATATCGGTA GTCTTCATCG CTTCGGGAAT 

801 GGAGCGGATT AACACCTCAT CCGAAGGCGA CATCCCTTTT TGCACCAACG 

851 CGGAAAAACC GCCGATAAAA GACACACCGA TGGCTTTGGC AGCTTTATCC 

901 AAAGTTTGCG CCACGCTGAC GTAA 

This corresponds to the amino acid sequence <SEQ ID 676; ORF24-l>: 



1 MRTAWLLLI MPMAASSAM M PEMVCAGVSP GTAIISKPTE QTAVMASSLS 

51 SVSTPASAAA IIPSSSETGI NAPLKPPTAL EAIMPPFFTA SFSNAKAAW 

101 PCVPQTLKPI SSRMRATESP TAGVGASDKS RIPNGIFSIF EASRPMSSPT 

151 RVILKAVFFT TSATSVNWA SEFSNAAFTT PGPDTPTLIT ASASPEP*NA 

201 PAINGLSSTA LQNTTILAQP KPSGVIS AVR LTVSPASLTA SILI PARVLP 

251 ILMELHTISV VFIA SGMERI NTSSEGDIPF CTNAEKPPIK DTPMALAALS 

301 KVCATLT* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from Kmeninsitidis (strain A) 

ORF24 shows 96.4% identity over a 307 aa overlap with an ORF (ORF24a) from strain A of N. 
meningitidis: 



10 20 30 40 50 60 

or f 24a . pep MRTAWLLLIMPMAASSAMMPEMVCAGVSPGTAIISXPTEQTAVIASSLSNVSTPASAAA 
II I 1 I 1 I I I 1 I 1 I I I I I I 1 I 1 1 I I I I I 1 1 I I 1 I 1 t I I I I I t II : I I I I I : I I I I f I I M 
orf24 MRTAWLLL IMPMAAS S AMMPEMVCAGVS PGTAI ISKPTEQTAVMAS SLS S VST PASAAA 

10 20 30 40 50 60 



70 80 90 100 110 120 

or f 24a . pep II PSSSXTGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPI SSRMRATESP 
MINI M I I I I I I I I I I I I I i i 1 I I ! I I I 1 M I I I I I I II I I I I I I I I I I I I I I I II I 
orf24 I IPSSSETGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPI SSRMRATESP 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf24a.pep TAGVGASDKSRIPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVNWASEFSNAAFTT 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I t I I I II I I I I I I I I I I I I I I I I I I II II I I I 
orf24 TAGVGASDKSRIPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVNWASEFSNAAFTT 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 24a . pep PGPDTPTLI TAS AS PEPXNAPAIXGLS SXALQNTT ILAQPKPS SVI SXVRLMVS PASLTA 
I I 1 I 1 t 1 I t 1 t 1 1 I I I I i ! i 1 f I I H I: I I I II II I I I I M I :! I I Ml MINIM 
orf 24 PGPDTPTLITASASPEPXNAPAINGLSSTALQNTTILAQPKPSGVISAVRLTVS PASLTA 

190 200 210 220 230 240 



250 260 270 280 290 300 

orf 24a. pep SILIPARVLPILMELHTISWFIASGMERXNTSSEGDIPFCTSAEKPPIKDTPMALAALS 
I 1 I I I 1 t I I M 1 1 M I t I I I 1 I I I I I I I K MUM 111111:1111 I MINIMUM 
orf 24 SILIPARVLPILMELHTISWFIASGMERINTSSEGDIPFCTNAEKPPIKDTPMALAALS 
250 260 270 280 290 300 
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or f 2 4a. pep KVCATLTX 
I I 1 I I I I I 
orf24 KVCATLTX .. 

The complete length ORF24a nucleotide sequence <SEQ ID 677> is: 

1 ATGCGCACGG CAGTGGTTTT GCTGTTGATC ATGCCGATGG CGGCTTCGTC 

51 GGCAATGATG CCGGAAATGG TGTGCGCGGG TGTGTCGCCG GGAACGGCAA 

101 TCATATCCAA NCCGACCGAA CAAACGGCGG TCATCGCTTC GAGTTTATCC 

151 AACGTCAGCA CGCCTGCTTC GGCGGCGGCA ATCATACCTT CGTCTTCGGA 

201 NACGGGGATA AACGCGCCAC TCAAACCGCC AACCGCGCTC GAAGCCATCA 

251 TGCCGCCCTT TTTCACGGCA TCGTTCAGCA ATGCCAAAGC TGCTGTTGTG 

301 CCGTGCGTAC CGCAGACGCT CAAACCQATT TCTTCAAGAA TGCGCGCCAC 

351 CGAGTCGCCG ACGGCAGGGG TCGGTGCCAG CGACAAGTCG AGAATACCAA 

401 ACGGGATATT CAGCATTTTT GAGGCTTCGC GGCCGATGAG TTCGCCCACG 

451 CGGGTAATTT TGAAGGCGGT TTTCTTCACA ACTTCGGCAA CTTCGGTCAA 

501 TGTCGTTGCA TCCGAATTTT CCAACGCGGC TTTTACGACA CCCGGGCCGG 

551 ATACGCCGAC ATTAATCACA GCATCCGCTT CGCCTGAGCC GTGAAACGCG 

601 CCCGCCATAN ACGGGTTGTC TTCCNCCGCG TTGCAGAACA CGACGATTTT 

651 GGCGCAGCCG AAACCTTCTA GTGTGATTTC ANCCGTGCGT TTGATGGTTT 

701 CGCCCGCCAG TCTGACCGCG TCCATATTGA TACCGGCGCG CGTACTGCCG 

751 ATATTGATGG AGCTGCACAC GATATCAGTA GTCTTCATCG CTTCGGGAAT 

801 GGAACGGATN AACACCTCGT CAGAAGGCGA CATACCTTTT TGCACCAGCG 

851 CGGAAAAGCC GCCAATAAAA GACACGCCGA TGGCTTTGGC AGCCTTATCC . 

901 AAAGTTTGCG CCACGCTGAC GTAA I 

This encodes a protein having amino acid sequence <SEQ ED 678>: 



1 MRTAWLLLI MPMAAS SAMM PEMVCAGVSP GTAIISXPTE QTAVIASSLS 

51 NVSTPASAAA IIPSSSXTGI NAPLKPPTAL EAIMPPFFTA SFSNAKAAW 

101 PCVPQTLKPI SSRMRATESP TAGVGASDKS RIPNGIFSIF EASRPMSSPT 

151 RVI LKAVFFT TSATSVNWA SEFSNAAFTT PGPDTPTLIT ASASPEP*NA 

201 PAIXGLSSXA LQNTTILAQP KPSSVISXVR LMVSPASLTA SILIPARVLP 

251 ILMELHTISV VFIASGMERX NTSSEGDIPF CTSAEKPPIK DTPMALAALS 

301 KVCATLT* 

It should be noted that this protein includes a stop codon at position 198. 



ORF24a and ORF24-1 show 96.4% identity in 307 aa overlap: 

10 20 30 40 50 60 

or f 24 a . pep MRTAWLLLIMPMAASSAMMPEMVCAGVSPGTAIISXPTEQTAVIASSLSNVSTPASAAA 
I I i I I I i I f I I f I t I I I I I I I i I I I t 1 1 I | | 1 | | | | II I I III: II I I !: Ml!! || I 
orf24-l MRTAWLLLIMPMAAS S AMMPEMVCAGVS PGTAI I SKPTEQTAVMAS S LS S VST PASAAA 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf24a.pep IIPSSSXTGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPISSRMRATESP 
IMIH I > 1 I I I I I I I 1 I I I I I I 1 ! t 1 I I I I t 1 I I ! I I 1 I I I I I II I I II III Hi Ml 
orf24-l 1 1 PSSSETGINAPLKPPTALEAIMPPFFTASFSNAKAAVVPCVPQTLKPI SSRMRATESP 

70 80 90 100 110 120 

130 140 150 160 170 180 

or f 24a. pep TAGVGASDKSRIPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVNWASEFSNAAFTT 
IN IMMI IIIIIIIIMIIMIIMIIIIIII I || III I III II I I Ml MMMIM 
orf24-l TAGVGAS DKSRI PNG I FS I FEASRPMS S PTRVI LKAVFFTT SAT SVN WASEFSNAAFTT 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf24a.pep PGPDTPTLITASASPEPXNAPAIXGLSSXALQNTTILAQPKPSSVISXVRLMVSPASLTA 
M I I I I I I I I I I I I I I I I I I I I I II Ihllll II INI till:) I I III Mill III 
orf24-l PGPDTPTLITASASPEPXNAPAINGLSSTALQNTTILAQPKPSGVISAVRLTVSPASLTA 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf24a.pep SILIPARVLPILMELHTISWFIASGMERXNTSSEGDIPFCTSAEKPPIKDTPMALAALS 
1 1 1 1 1 1 1 1 r I I I 1 I t ! 1 I | I | ! | f i | | | 1 I I 1 I I 1 I t I M I : t I I I I I I I II I 1 M I M 
orf24-l SILIPARVLPILMELHTISWFIASGMERINTSSEGDIPFCTNAEKPPIKDTPMALAALS 

250 260 270 280 290 300 
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orf 24a. pep KVCATLTX 
I I I I I I I I 

orf24-l KVCATLTX 

Homology w ith a predicted ORF from N. gonorrhoeae 

ORF24 shows 96.7% identity over a 121 aa overlap with a predicted ORF (ORF24ng) from 
N. gonorrhoeae: 

orf 2 4 . pep MRTAWLLLIMPMAASSAMMPEMVCAGVSPGTAIISKPTEQTAVMASSLSSVSTPASAAA 60 

I I I I I I II II I I 111 I I i I I I I I I II I I I II 111:11 I I I I MM I I I I II 1:1 I III II 
orf24ng MRTAWLLLIMPMAASSAMMPEMVCAGVSPGTAIMSKPTEQTAVMASSLSSVNTPASAAA 60 

orf 24 .pep IIPSSSETGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPIXSRMRATXSP 120 

1 1 1 1 1 1 1 1 1 ii 1 1 1 ii 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 linn ii 

orf24ng IIPSSSETGINAPLKPPTALEAIMPPFFTASFSNAKAAVVPCVPQTLKPISSRMRATESP 120 



orf 2 4. pep TG 122 
I: 

orf 2 4ng TAGVGASDKSRMPNGI FS I FEASRPMSS PTRVI LKAVFFTTSATSVRLTASEFS SAALTT 180 



The complete length ORF24ng nucleotide sequence <SEQ ID 679> is: 



1 ATGCGCACGG CGGTGGTTTT GCTGTTGATC ATGCCGATGG CGGCTTCGTC 

51 GGCGATGATG CCGGAAATGG TGTGCGCGGG CGTGTCGCCG GGAACGGCAA 

101 TCATGTCCAA ACCAACGGAG CAGACGGCGG TCATGGCTTC GAGTTTGTCC 

151 AGCGTCAACA CGCCTGCCTC GGCGGCGGCA ATCATACCTT CGTCTTCGGA 

201 AACGGGGATA AACGCGCCGC TCAAACCGCC GACCGCGCTG GAAGCCATCA 

251 TGCCGCCCTT TTTCACGGCA TCGTTCAGCA ATGCCAAAGC TGCTGTTGTG 

301 CCGTGCGTAC CGCAGACGCT CAAGCCCATT TCTTCAAGAA TGCGCGCCAC 

351 CGAGTCGCCG ACGGCGGGGG TCGGTGCCAG CGACAAATCG AGAATGCCGA 

401 ACGGGATATT CAGCATTTTT GAGGCTTCGC GACCGATGAG TTCGCCCACG 

451 CGGGTGATTT TGAAAGCGGT TTTCTTCACG ACTTCGGCGA CCTCGGTCAG 

501 GCTGACCGCG TCCGAATTTT CCAGCGCGGC TTTGACCACG CCTGGACCGG 

551 ATACGCCGAC ATTAATCACA GCATCCGCTT CGCCCGAGCC GTGGAACGCA 

601 CCCGCCATAA ACGGATTGTC TTCCACCGCG TTGCAGAACA CGACGATTTT 

651 GGCGCAGCCG AAACCTTCGG GTGTGATTTC AGCCGTGCGT TTGATGGTTT 

701 CGCCTGCCAG CTTGACCGCA TCCATATTGA TACCGGCACG CGTGCTGCCG 

751 ATATTGATGG AGCTGCACAC GATATCGGTA GTTTTCATCG CTTCGGGAAC 

801 * GGAACGGATC AACACCTCAT CCGAAGGCGA CATACCTTTT TGCACCAGCG 

851 CGGAAAAGCC GCCGATAAAG GACACGCCGA TGGCTTTGGC TGCCTTGTCC 

901 AAAGTCTGCG CCACGCTGAC ATAA 

This encodes a protein having amino acid sequence <SEQ ID 680>: 



1 MRTAWLLLI MPMAASSAM M PEMVCAGVSP GTAIMSKPTE QTAVMASSLS 

51 SVNTPASAAA IIPSSSETGI NAPLKPPTAL EAIMPPFFTA SFSNAKAAW 

101 PCVPQTLKPI SSRMRATESP TAGVGASDKS RMPNGIFSIF EASRPMSSPT 

151 RVILKAVFFT TSATSVRLTA SEFS SAALTT PGPDTPTLIT ASASPEPWNA 

201 PAINGLSSTA LQNTTILAQP KPSGVIS AVR LMVSPASLTA SILI PARVLP 

251 ILMELHTISV VFIA SGTERI NTSSEGDIPF CTSAEKPPIK DTPMALAALS 

301 KVCATLT* 

ORF24ng and ORF24-1 show 96.1% identity in 307 aa overlap: 



10 20 30 40 50 60 

or f 2 4 - 1 . pep MRTAWLLLIMPMAAS S AMMPEMVCAGVS PGTAI I SKPTEQTAVMAS SL S S VST PASAAA 
Mill. II II Ml M I I I I I II I I 111 Ml I Ml: I I M Mill M I 11111:1111111 
orf24ng MRTAWLLLIMPMAAS S AMMPEMVCAGVS PGTAIMSKPTEQTAVMAS SLS SVNTPASAAA 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 24 -1 . pep 1 1 PS SSETGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPI SSRMRATESP 
MMMII Mill Ml III IMIMIIMII Ml I Mil Ml ill I I II II Mill illl 
orf24ng 1 1 PS SSETGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPI SSRMRATESP 

70 80 90 100 110 120 
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130 140 150 160 170 180 

or f 24-1 .pep TAGVGASDKSRIPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVNWASEFSNAAFTT 

I I MIMMI 1:1 I Mill Mill 'III Ml III II I I MUNI I : : I I I I I : I I : I I 
orf24ng TAGVGASDKSRMPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVRLTASEFSSAALTT 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 24-1 . pep PGPDTPTLITASASPEPXNAPAINGLSSTALQNTTILAQPKPSGVISAVRLTVSPASLTA 
IIIIMIIIIIItlill I I I I I I f I I I I I I I K I I I I I I t I I I I I I I I I 1 I MMIIM 
orf24ng PGPDTPTLITASASPEPWNAPAINGLSSTALQNTTILAQPKPSGVISAVRLMVSPASLTA 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 24-1. pep SILIPARVLPILMELHTISWFIASGMERINTSSEGDIPFCTNAEKPPIKDTPMALAALS 

II I I I I I I I I I M I I I I I I I I I I 1 1 I I I I I I I I I I I I I I ||: I I | | | | | | | | | | | | | | | 
orf24ng S ILI PARVLPILMELHTISWFIASGTERINTSSEGDI PFCTSAEKPPIKDTPMALAALS 

250 260 270 280 290 300 

orf24-l.pep KVCATLTX 
IIIIIIM 

orf24ng . KVCATLTX 

Based on this analysis, including the presence of a putative leader sequence (first 18 aa- double- 
underlined) and putative transmembrane domains (single-underlined) in the gonococcal protein, 
it is predicted that the proteins from N.meningitidis and N.gonorrhoeae, and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. j 

Example 81 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 68 1>: 

1 . .ACCGACGTGC AAAAAGAGTT GGTCGGCGAA CAACGCAAGT GGGCGCAGGA 

51 AAAAATCAGC AACTGCCGAC AAGCCGCCGC GCAGGCAGAC CGGCAGGAAT 

101 ACGCCGAATA CCTCAAGCTG CAATGCGACA CGCGGATGAC GCGCGAACGG 

151 ATACAGTATC TTCGCGGCTA TTCCATCGAT TAG 

This corresponds to the amino acid sequence <SEQ ID 682; ORF25>: 

1 ..TDVQKELVGE QRKWAQEKIS NCRQAAAQAD RQEYAEYLKL QCDTRMTRER 
51 IQYLRGYSID * 

Further work revealed the complete nucleotide sequence <SEQ ID 683>: 

1 ATGTATCGGA AACTCATTGC GCTGCCGTTT GCCCTGCTGC TTGCCGCTTG 

51 CGGCAGGGAA GAACCGCCCA AGGCATTGGA ATGCGCCAAC CCCGCCGTGT 

101 TGCAAGGCAT ACGCGGCAAT ATTCAGGAAA CGCTCACGCA GGAAGCGCGT 

151 TCTTTCGCGC GCGAAGACGG CAGGCAGTTT GTCGATGCCG ACAAAATTAT 

201 CGCCGCCGCC TACGGTTTGG CGTTTTCTTT GGAACACGCT TCGGAAACGC 

251 AGGAAGGCGG GCGCACGTTC TGTATCGCCG ATTTGAACAT TACCGTGCCG 

301 TCTGAAACGC TTGCCGATGC CAAGGCAAAC AGCCCCCTGT TGTACGGGGA 

351 AACTGCTTTG TCGGATATTG TGCGGCAGAA GACGGGCGGC AATGTCGAGT 

401 TTAAAGACGG CGTATTGACG GCAGCCGTCC GCTTCCTGCC CGTCAAAGAC 

451 GGTCAGACGG CATTTGTCGA CAACACGGTC GGTATGGCGG CGCAAACGCT 

501 GTCTGCCGCG CTGCTGCCTT ACGGCGTGAA GAGCATCGTG ATGATAGACG 

551 GCAAGGCGGT GAAAAAAGAA GACGCGGTCA GGATTTTGAG CGGAAAAGCC 

601 CGTGAAGAAG AACCGTCCAA ACCCACGCCC GAAGACATTT TGGAACACAA 

651 TGCCGCCGGC GGCGATGCGG GCGTACCCCA AGCCGCAGAA GGCGCGCCCG 

701 AACCGGAAAT CCTGCATCCT GACGACGGCG AGCGTGCCGA TACCGTTACC 

751 GTATCACGGG GCGAAGTGGA AGAGGCGCGC GTACAAAACC AGCGTGCGGA 

801 ATCCGAAATT ACCAAACTTT GGGGAGGACT CGATACCGAC GTGCAAAAAG 

851 AGTTGGTCGG CGAACAACGC AAGTGGGCGC AGGAAAAAAT CAGCAACTGC 

901 CGACAAGCCG CCGCGCAGGC AGACCGGCAG GAATACGCCG AATACCTCAA 

951 GCTGCAATGC GACACGCGGA TGACGCGCGA ACGGATACAG TATCTTCGCG 

1001 GCTATTCCAT CGATTAG 
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This corresponds to the amino acid sequence <SEQ ID 684; ORF25-l>: 

1 MYRKLIALPF ALLLAAC GRE EPPKALECAN PAVLQGIRGN IQETLTQEAR 

51 SFAREDGRQF VDADKIIAAA YGLAFSLEHA SETQEGGRTF CIADLNITVP 

101 SETLADAKAN SPLLYGETAL SDIVRQKTGG NVEFKDGVLT AAVRFLPVKD 

151 GQTAFVDNTV GMAAQTLSAA LLPYGVKSIV MIDGKAVKKE DAVRILSGKA 

201 REEEPSKPTP EDILEHNAAG GDAGVPQAAE GAPEPEILHP DDGERADTVT 

251 VSRGEVEEAR VQNQRAESEI TKLWGGLDTD VQKELVGEQR KWAQEKISNC 

301 RQAAAQADRQ EYAEYLKLQC DTRMTRERIQ YLRGYSID* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.meninz itidis (strain A) 

ORF25 shows 98.3% identity over a 60aa overlap with an ORF (ORF25a) from strain A of N. 
meningitidis: 



orf25.pep 
orf25a 



10 20 30 

TDVQKELVGEQRKWAQEKISNCRQAAAQAD 

IMIIIIill I 1 1 t 1 1 I I 1 I I 1 I I I I 1 1 I 
VT VSRGE VEEARVQNQRAE SE ITKLWGGLDT DVQKELVGEXRKWAQEKI SNCRQAAAQAD 
250 260 270 280 290 300 



40 50 60 

orf 25 . pep RQEYAEYLKLQCDTRMTRERIQYLRGYSIDX 
Ml I 1 1 1 1 MM II 1 1 1 1 II M I IN I M M 
orf 25a RQEYAEYLKLQCDTRMTRERIQYLRGYSIDX 
310 320 330 

The complete length ORF25a nucleotide sequence <SEQ ID 685> is: 

1 ATGTATCGGA AACTCATTGC GCTGCCGTTT GCCCTGCTGC TTGCCGCTTG 

51 CGGCAGGGAA GAACCGCCCA AGGCATTGGA ATGCGCCAAC CCCGCCGTGT 

101 TGCAANGCAT ACGCNGCAAT ATTCAGGAAA CGCTCACGCA GGAAGCGCGT 

151 TCTTTCGCGC GCGAAGACNG CANGCAGTTT GTCGATGCCG ACNAAATTAT 

201 CGCCGCCGCC TANGNTNNGN NGNTNTCTTT GGAACACGCT TCGGAAACGC 

251 AGGAAGGCGG GCGCACGTTC TGTNTCGCCG ATTTGAACAT TACCGTGCCG 

301 TCTGAAACGC TTGCCGATGC CAAGGCAAAC AGCCCCCTGC TGTACGGGGA 

351 AACCGCTTTG TCGGATATTG TGCGGCAGAA GACGGGCGGC AATGTCGAGT 

401 TTAAAGACGG CGTATTGACG GCAGCCGTCC GCTTCCTACC CGTCAAAGAC 

451 GGTCAGANGG CATTTGTCGA CAACACGGTC GGTATGGCGG CGCAAACGCT 

501 GTCTGCCGCG TTGCTGCCTT ACGGCGTGAA GAGCATCGTG ATGATAGACG 

551 GCAAGGCGGT AAAAAAAGAA GACGCGGTCA GGATTNTGAG CNGANAAGCC 

601 CGTGAANAAG AACCGTCCAA ANCCNNGCCC GAAGACATTT TGGAACATAA 

651 TGCCGCCGGA GGGGATGCAG ACGTACCCCA AGCCGGAGAA GACGCGCCCG 

701 AACCGGAAAT CCTGCATCCT GACGACGGCG AGCGTGCCGA TACCGTTACC 

751 GTATCACGGG GCGAAGTGGA AGAGGCGCGN GTACAAAACC AGCGTGCGGA 

801 ATCCGAAATT ACCAAACTTT GGGGAGGACT CGATACCGAC GTGCAAAAAG 

851 AGTTGGTCGG CGAANAACGC AAGTGGGCGC AGGAAAAAAT CAGCAACTGC 

901 CGACAAGCCG CCGCGCAGGC AGACCGGCAG GAATACGCCG AATACCTCAA 

951 GCTGCAATGC GACACGCGGA TGACGCGCGA ACGGATACAG TATCTTCGCG 

1001 GCTATTCCAT CGATTAG 

This encodes a protein having amino acid sequence <SEQ ID 686>: 

1 MYRKLIALPF ALLLAA CGRE EPPKALECAN PAVLQXIRXN IQETLTQEAR 

51 SFAREDXXQF VDADXIIAAA XXXXXSLEHA SETQEGGRTF CXADLNITVP 

101 SETLADAKAN SPLLYGETAL SDIVRQKTGG NVEFKDGVLT AAVRFLPVKD 

151 GQXAFVDNTV GMAAQTLSAA LLPYGVKSIV MIDGKAVKKE DAVRIXSXXA 

201 REXEPSKXXP EDILEHNAAG GDADVPQAGE DAPEPEILHP DDGERADTVT 

251 VSRGEVEEAR VQNQRAESEI TKLWGGLDTD VQKELVGEXR KWAQEKISNC 

301 RQAAAQADRQ EYAEYLKLQC DTRMTRERIQ YLRGYSID* 

ORF25a and ORF25-1 show 93.5% identity in 338 aa overlap: 

; 10 20 30 40 50 60 

orf25a t>eo MYRKLIALPFALLLAACGREEPPKALECANPAVLQXIRXNIQETLTQEARSFAREDXXQF 

II 1 1 1 1 II 1 1 1 1 1 1 U 1 1 1 1 1 1 1 1 1 1 1 1 1 1 HI M 1 1 1 1 1 II I II 1 1 1 1 1 1 1 1 J 1 1 

orf 25-1 MYRKLIALPFALLLAACGREEPPKALECANPAVLQGIRGNIQETLTQEARSFAREDGRQF 
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10 



20 



30 



40 



50 



60 



70 80 90 100 110 120 

orf 25a . pep VDADXIIAAAXXXXXSLEHASETQEGGRTFCXADLNITVPSETLADAKANSPLLYGETAL 
I M I Mill t I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I | I I | | | | 
orf 25-1 VDADKIIAAAYGLAFSLEHASETQEGGRTFCIADLNITVPSETLADAKANSPLLYGETAL 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 25a . pep SDIVRQKTGGNVEFKDGVLTAAVRFLPVKDGQXAFVDNTVGMAAQTLSAALLPYGVKSIV 
IMIIMMI Mill III II I II IN I IMICIIIII Mill IIIIIMIIIIIIIIII 
orf 25-1 SDIVRQKTGGN VE FKDGVLTAAVRFLPVKDGQTAFVDNTVGMAAQTLSAALLPYGVKS I V 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 25a . pep MIDGKAVKKEDAVRIXSXXAREXEPSKXXPEDILEHNAAGGDADVPQAGEDAPEPEILHP 
I I I I I I I I I I I I I I ! | III MM MIMMMMMII IIICI IMIIIMI 
orf 25-1 MIDGKAVKKEDAVRILSGKAREEEPSKPTPEDILEHNAAGGDAGVPQAAEGAPEPEILHP 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 25a . pep DDGERADTVTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEXRKWAQEKISNC 
i 1 I M I I I I I I I I M I II I I I I I I I I I II I I M I I M I M I I I I I I I I M II I M I M i 
orf 25-1 DDGERADTVTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEQRKWAQEKISNC 

250 260 270 280 290 300 

I 

310 320 330 339 

orf 25a. pep RQAAAQADRQEYAEYLKLQCDTRMTRERIQYLRGYS I DX 

M I M M I I I II I II II I I I II I II II I I II I I I II I M 
or f 2 5- 1 RQAAAQADRQEYAEYLKLQCDTRMTRERIQYLRGYS I DX 

310 320 330 I 

Homology with a predicted ORF from N. gonorrhoeae I 
ORF25 shows 100% identity over a 60aa overlap with a predicted ORF (ORF25ng) from 
N. gonorrhoeae: 



orf 25 .pep 
orf25ng 
orf 25. pep 
orf25ng 



TDVQKELVGEQRKWAQEKI SNCRQAAAQAD 3 0 

1 1 I I t ! I I I I I I I I I I I I I I I I Ml 

VTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEQRKWAQEKI SNCRQAAAQAD 308 

RQEYAEYLKLQCDTRMTRERIQYLRGYSI D 60 

M M I M I M M I I II I I I I I II II 

RQEYAEYLKLQCDTRMTRERIQYLRGYSI D 338 



The complete length ORF25ng nucleotide sequence <SEQ ID 687> is: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 



ATGTATCGGA 
CGGCAGGGAA 
TGCAGGACAT 
TCTTTCGCGC 
CGCCGCCGCC 
AGGAAGGCGG 
TCTGAAACGC 
AACGTCTTTG 
TTAAAGACGG 
GCTCGGACGG 
GTCTGCCGCG 
GCAAGGCGGT 
CGTGAAGAAG 
TGCCGCCGGC 
AACCCGAAAT 
GTATCACGGG 
ATCCGAAATT 
AGTTGGTCGG 
cgACAAGCCG 
GCTCCAATGC 
GCTATTCCAT 



AACTCATTGC 
GAACCGCCCA 
ACGCGGCAGT 
GCGAAGACGG 
TACGGTTTGG 
GCGCACGTTC 
TTGCCGATGC 
GCAGACATCG 
CGTATTGACG 
CATTTATCGA 
TTGCTGCCTT 
GACAAAAGAA 
AACCGTCCAA 
GGCGATGCGG 
CCTGCATCCC 
GCGAAGTGGA 
ACCAAACTTT 
CGAACAGCGC 
CCGCGCAGGC 
GACACGCGGA 
CGATTAG 



GCTGCCGTTT 
AGGCGTTGGA 
ATTCAGGAAA 
CAGGCAGTTT 
CGTTTTCTTT 
TGTATCGCCG 
CGAGGCAAAC 
TGCAGCAGAA 
GCAGCCGTCC 
CAACACGGTC 
ACGGCGTGAA 
GACGCGGTCA 
ACCCACCCCC 
GCGTACCCCA 
GACGACGTCG 
AGAGGCGCGC 
GGGGAGGACT 
AAGTGGGCGC 
AGACCGGCAG 
TGACGCGCGA 



GCCCTGCTGC 
ATGCGCCAAC 
CGCTCACGCA 
GTCGATGCCG 
GGAACACGCT 
ATTTGAACAT 
AGCCCCCTGC 
GACGGGCGGC 
GCTTCCTGCC 
GGTATGGCGA 
GAGCATCGTG 
GGGTTTTGAG 
GAAGACATTT 
AGCCGCAGAA 
AGCGTGCCGA 
GTACAAAACC 
CGATACCGAC 
AGGAAAAAAT 
GAATACGCCG 
ACggaTACAG 



TTGCAGCGTG 
CCCGCCGTGT 
GGAAGCGCGT 
ACAAAATTAT 
TCGGAAACGC 
TACCGTGCCG 
TGTATGGGGA 
AATGTCGAGT 
CGCCAAAGAC 
CGCAAACGCT 
ATGATAGACG 
CGGCAAAGCC 
TGGAACACAA 
GGCGCACCCG 
TACCGTTACC 
AACGTGCGGA 
GTGCAAAAAG 
CAGcaactgc 
AATACCTCAA 
TATCTTCGCG 



This encodes a protein having amino acid sequence <SEQ ID 688>: 
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1 MYRKLIALPF ALLLAACGRE EPPKALECAN PAVLQDIRGS IQETLTQEAR 

51 SFAREDGRQF VDADKI IAAA YGLAFSLEHA SETQEGGRTF CIADLNITVP 

101 SETLADAEAN SPLLYGETSL ADIVQQKTGG NVEFKDGVLT AAVRFLPAKD 

151 ARTAFIDNTV GMATQTLSAA LLPYGVKSIV MIDGKAVTKE DAVRVLSGKA 

5 201 REEEPSKPTP EDILEHNAAG GDAGVPQAAE GAPE PE I LHP DDVERADTVT 

251 VSRGEVEEAR VQNQRAESEI TKLWGGLDTD VQKELVGEQR KWAQEKISNC 

301 RQAAAQADRQ EYAEYLKLQC DTRMTRERIQ YLRGYSID* 

ORF25ng and ORF25-1 show 95.9% identity in 338 aa overlap: 

10 20 30 40 50 60 

10 or f 25-1 . pep MYRKLIALPFALLLAACGREEPPKALECANPAVLQGIRGNIQETLTQEARSFAREDGRQF 

I I I I I I 1 I I I I I I I I i II I ! I I I I I I i I I i I I t I I I I I : I i I I I I 1 I I 1 I I I I 1 1 1 I I 1 
orf25ng MYRKLIALPFALLLAACGREEPPKALECANPAVLQDIRGSIQETLTQEARSFAREDGRQF 

10 20 30 40 50 60 

15 70 80 90 100 110 120 

orf 25-1. pep VDADKI IAAAYGLAFS LEHASETQEGGRTFCI ADLNITVPSETLADAKANS PLLYGETAL 
IIIIMM llltlill ill II llllllll IIIM IIIIIIMMII hlllllllll 1:1 
orf25ng VDADKI I AAAYGLAFSLEHASETQEGGRT FCI ADLN ITVPSETLADAEANS PLLYGETSL 

70 80 90 100 110 120 

20 

130 140 150 160 170 180 

or f 25-1 . pep S D I VRQKTGGNVEFKDGVLT AAVRFLPVKDGQTAFVDNTVGMAAQTLSAALLPYGVKS IV 
: || 1 : 1 I I I I II I I I I II I I II II I I I : I I : : I I I : I I I I 1 1 I : I I I I I I I I I I I I I I I I 
orf25ng ADI VQQKTGGNVE FKDGVLTAAVRFLPAKDARTAFI DNTVGMATQTLS AALLPYGVKS IV 

25 130 140 150 160 170 180 

190 200 210 220 230 240 

or f 25-1 . pep MIDGKAVKKEDAVRILSGKAREEEPSKPTPEDILEHNAAGGDAGVPQAAEGAPEPEILHP 
I | | | I | | I I I I I I : I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I U I I I I I I I I I I I I 
30 orf25ng MIDGKAVTKEDAVRVLSGKAREEEPSKPTPEDILEHNAAGGDAGVPQAAEGAPEPEILHP 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 25-1 . pep DDGERADTVTVSRGEVEEARVQNQRAE SE ITKLWGGL DTDVQKELVGEQRKWAQEKI SNC 
35 II ! I I I I I I I I I I I M I I I I I I I I I M I I I I I I I M I I I I I I I I I I I I I I i I I I ( I I M 

orf25ng DDVERADTVTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEQRKWAQEKISNC 

250 260 270 280 290 300 

310 320 330 339 

40 orf 25-1 .pep RQAAAQADRQE YAEYLKLQCDTRMTRERI QYLRGYS I DX 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 
o r f 2 5ng RQAAAQADRQE YAEYLKLQCDTRMTRERI QYLRGYS I DX 

310 320 330 

45 Based on this analysis, including the presence of a predicted prokaryotic membrane lipoprotein 
lipid attchment site (underlined) in the gonococcal protein, it was predicted that the proteins from 
Kmeningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 

ORF25-1 (37kDa) was cloned in pET and pGex vectors and expressed in Exoli, as described 
50 above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
16A shows the results of affinity purification of the GST-fusion protein, and Figure 16B shows the 
results of expression of the His-fusion in E.coli. Purified His-fusion protein was used to immunise 
mice, whose sera were used for Western blot (Figure 16C), ELISA (positive result), and FACS 
analysis (Figure 16D). These experiments confirm that ORF25-1 is a surface-exposed protein, and 
55 that it is a useful immunogen. 
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Figure 16E shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF25-1 



Example 82 

The following partial DNA sequence was identified in ^meningitidis <SEQ ID 689> 

1 ATGCAGCTGA TCGACTATTC ACATTCATTT TTCTCGGTTG TGCCACCCTT 

51 TTTGGCACTG GCACTTGCCG TCATTACCCG CCGCGTACTG CTGTCTTTAG 

101 GCATCGGTAT TCTGGwysGC GTTGCCTTTT TGGTCGGCGG CAACCCCGTC 

151 GACGGTCTGA CACACCTGAA AGACATGGTC GTCGGCTTGG CTTGGTCAGA 

201 CGsyGATTGG TCGCTGGGCA AACCAAAAAT CTTGGTTTTC CkGATACTTT 

251 TGGGTATTTT TACTTCCCTG CTGACCTACT CCGGCAGCAA T 

// 

851 AC TTCGCTGGTA 

901 TTCGGCGGCA CTTGCGGCGT CTTTGCCGTC GTTCTCTGCA CGCTCGGCAC 

951 GATTAAAACC GCCGACTATC CCAAAGCCGT TTGGCAGGGT GCGAAATCTA 

1001 TGTTCGGCGC AATCGCCATT TTAATCCTCG CTTGGCTCAT CAGTACGGTT 

1051 GTCGGCGAAA TGCACACCGG CGATTACCTC TCCACACTGG TTGCGGGCAA 

1101 CATCCATCCC GGCTTCCTGC CCGTCATCCT CTTCCTGCTC GCCAGCGTGA 

1151 TGGCGTTTGC CACAGGCACA AGCTGGGGGA CGTTCGGCAT TATGCTGCCG 

1201 ATTGCCGCCG CCATGGCGGT CAAAGTCGAA CCCGCGCTGA TTATCCCGTG 

1251 TATGTCCGCA GTAATGGCGG GGGCGGTATG CGGCGACCAC TGCTCGCCCA 

1301 TTTCCGACAC GACCATCCTG TCGTCCACCG GCGCGCGCTG CAACCACATC 

1351 GACCACGTTA CCTCGCAACT GCCTTACGCC TTAACCGTTG CCGCCGCCGC ■ 

1401 CGCATCGGGC TACCTCGCAT TGGGTCTGAC AAAATCCGCG CTGTTGGGCT 

1451 TTGGCACGAC AGGCATTGTA TTGGCGGTGC TGATTTTTCT GTTGAAAGAT 

1501 AAAAAA. . 

This corresponds to the amino acid sequence <SEQ ID 690; ORF26>: 

1 MQLIDYSHSF FSWPPFLAL ALAVITRRVL LSLGIGILXX VAFLVGGNPV 

51 DGLTHLKDMV VGLAWSDXDW SLGKPKILVF XILLGIFTSL LTYSGSN. . . 

// 

251 TSLV 

301 FGGTCGVFAV VLCTLGTIKT ADYPKAVWQG AKSMFGAIAI LILAWLISTV 

351 VGEMHTGDYL STLVAGNIHP GFLPVILFLL ASVMAFATGT SWGTFGIMLP 

401 IAAAMAVKVE PALIIPCMSA VMAGAVCGDH CSPISDTTIL SSTGARCNHI 

451 DHVTSQLPYA LTVAAAAASG YLALGLTKSA LLGFGTTGIV LAVLI FLLKD 

501 KK. . 

Further work revealed the complete nucleotide sequence <SEQ ID 69 1>: 

1 ATGCAGCTGA TCGACTATTC ACATTCATTT TTCTCGGTTG TGCCACCCTT 

51 TTTGGCACTG GCACTTGCCG TCATTACCCG CCGCGTACTG CTGTCTTTAG 

101 GCATCGGTAT TCTGGTCGGC GTTGCCTTTT TGGTCGGCGG CAACCCCGTC 

151 GACGGTCTGA CACACCTGAA AGACATGGTC GTCGGCTTGG CTTGGTCAGA 

201 CGGCGATTGG TCGCTGGGCA AACCAAAAAT CTTGGTTTTC CTGATACTTT 

251 TGGGTATTTT TACTTCCCTG CTGACCTACT CCGGCAGCAA TCAGGCGTTT 

301 GCCGACTGGG CAAAACGGCA CATTAAAAAC CGGCGCGGCG CGAAAATGCT 

351 GACCGCCTGC CTCGTGTTCG TAACCTTTAT CGACGACTAT TTCCACAGTC 

401 TCGCCGTCGG TGCGATTGCC CGCCCCGTTA CCGACAAGTT TAAAGTTTCC 

451 CGCACCAAAC TCGCCTACAT CCTCGACTCC ACTGCCGCTC CTATGTGCGT 

501 GCTGATGCCC GTTTCAAGCT GGGGCGCGTC GATTATCGCC ACGCTTGCCG 

551 GACTGCTCGT TACCTACAAA ATCACCGAAT ACACGCCGAT GGGGACGTTT 

601 GTCGCCATGA GCCTGATGAA CTATTACGCA CTGTTTGCCC TGATTATGGT 

651 GTTCGTCGTC GCATGGTTTT CCTTCGACAT CGGCTCGATG GCACGTTTCG 

701 AACAAGCCGC GTTGAACGAA GCCCACGATG AAACTGCCGT TTCAGACGCT 

751 ACCAAAGGTC GTGTTTACGC ACTGATTATT CCCGTTTTGG CCTTAATCGC 

801 CTCAACGGTT TCCGCCATGA TCTACACCGG CGCGCAGGCA AGCGAAACCT 

851 TCAGCATTTT GGGGGCATTT GAAAACACGG ACGTAAACAC TTCGCTGGTA 

901 TTCGGCGGCA CTTGCGGCGT CCTTGCCGTC GTTCTCTGCA CGCTCGGCAC 

951 GATTAAAACC GCCGACTATC CCAAAGCCGT TTGGCAGGGT GCGAAATCTA 

1001 TGTTCGGCGC AATCGCCATT TTAATCCTCG CTTGGCTCAT CAGTACGGTT 

1051 GTCGGCGAAA TGCACACCGG CGATTACCTC TCCACACTGG TTGCGGGCAA 

1101 CATCCATCCC GGCTTCCTGC CCGTCATCCT CTTCCTGCTC GCCAGCGTGA 

1151 TGGCGTTTGC CACAGGCACA AGCTGGGGGA CGTTCGGCAT TATGCTGCCG 

1201 ATTGCCGCCG CCATGGCGGT CAAAGTCGAA CCCGCGCTGA TTATCCCGTG 

1251 TATGTCCGCA GTAATGGCGG GGGCGGTATG CGGCGACCAC TGCTCGCCCA 

1301 TTTCCGACAC GACCATCCTG TCGTCCACCG GCGCGCGCTG CAACCACATC 
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1351 GACCACGTTA CCTCGCAACT GCCTTACGCC TTAACCGTTG CCGCCGCCGC 

1401 CGCATCGGGC TACCTCGCAT TGGGTCTGAC AAAATCCGCG CTGTTGGGCT 

14 51 TTGGCACGAC AGGCATTGTA TTGGCGGTGC TGATTTTTCT GTTGAAAGAT 

1501 AAAAAACGCG CCAACGCCTG A 

This corresponds to the amino acid sequence <SEQ ID 692; ORF26-l>: 

1 MQLIDYSHSF FSWPPFLAL A LAVITRR VL LSLGIGILVG VAFLVG GNPV 

51 DGLTHLKDMV VGLAWSDGDW SLGKP KILVF LILLGIFTSL LTY SGSNQAF 

101 ADWAKRHIKN RRGAKMLTAC LVFVTFID DY FHSLAVGAIA RPVTDKFKVS 

151 RTKLAYILDS TAAPMCVLMP VSSWGASIIA TLAGLLVT YK ITEYTPMGTF 

201 VAMSLMNYYA LFALIMVFW AWFSFDI GSM ARFEQAALNE AHDETAVSDA 

251 TKGRVY ALII PVLALIASTV SAMI YTGAQA SETFSILGAF ENTDVNTSLV 

301 FGGTCGVLAV VLCTL GTIKT ADYPKAVWQG AKS MFGAIAI LILAWLISTV 

351 VGEMHTGDYL STLVAGNIHP GFLPVILFLL ASVMAFA TGT SW GTFGIMLP 

4 01 IAAAMAVKVE P ALIIPCMSA VMAGAVCG DH CSPISDTTIL SSTGARCNHI 

451 DHVTSQLPY A LTVAAAAASG YLALGL TKSA LLGFGTTGIV LAVLIFL LKD 

501 KKRANA* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with the hypothetical transmembrane protein HI1586 of Kinfluenzae (accession nu mber P44263) 
ORF26 and HI1586 show 53% and 49% amino acid identity in 97 and 221 aa overlap at the 
N-terminus and C-terminus, respectively: 

0rf26 1 MQLIDYSHSFFSWPPFIJOAIAVITRRVXXXXXXXXXXXVAFLVGGNPVDGLTHLKDMV 60 

M+LID+S S +S+VP LA+ LA+ TRRV L +L V 

HI1586 14 MELIDFSSSWSIVPALLAIILAIATRRVLVSLSAGIIIGSLMLSDWQIGSAFNYLVKNV 73 

0rf26 61 VGLAWSDXDWSLGKPKILVFXILLGIFTSLLTYSGSN 97 

V L ++D + + I++F +LLG+ T+LLT SGSN 

HI1586 74 VSLVYADGEIN-SNMNIVLFLLLLGVLTALLTVSGSN 109 

// 

0rf26 86 IFTSLLTYSGS — NTSLVFGGTCGVFAWLCTL — GTIKT ADYPKAVWQG AKSMFGXXXX 141 

+F+ L T+ + TSLV GG C + L + + +Y ++ G KSM G 
HI1586 299 VFSVLGT FENT WGT SLWGGFCS 1 1 1 STLLI I LDRQVSVPE YVRSWI VGIKSMSGAIAI 358 

0rf26 142 XXXXXXXSTWGEMHTGDYLSTLVAGNIHPGFLPVILFLLASVMAFATGTSWGTFGIMLP 201 

+ +VG+M TG YLS+LV+GNI FLPVILF+L + MAF+TGT SWGTFGIMLP 
HI1586 359 LFFAWTINKIVGDMQTGKYLSSLVSGNIPMQFLPVILFVLGAAMAFSTGTSWGTFGIMLP 418 

0rf26 202 IAAAMAVKVE PALI I PCMS AVMAGAVCGDHCS PIS DTT ILS STGARCNHI DHVT S QXXXX 261 

IAAAMA P L++PC+SAVMAGAVCGDHCSP+SDTTILSSTGA+CNHIDHVT+Q 
HI1586 419 IAAAMAANAAPELLLPCLSAVMAGAVCGDHCSPVSDTTILSSTGAKCNHIDHVTTQLPYA 478 

0rf26 262 XXXXXXXXXXXXXXXXXKSALLGFGTTGIVLAVLIFLLKDK 302 

S L GF T + L V+IF +K + 
HI1586 479 ATVATAT SIGY I WG FT YS GLAGFAATAVSLI VI I FAVKKR 519 

Homology with a predicted ORF from N. meningitidis ( strain A) 

ORF26 shows 58.2% identity over a 502aa overlap with an ORF (ORF26a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

or f 2 6 . pep MQLIPYSHSFFSWPPFIALA LAVITR RVLLSLGIGILXXVAFLV GGNPVPGLTHLKDMV 
Mill INI IMMIMII MIMill Ml III ill 11 I I MM I Mil II I MM I I 
or f 2 6a MQLIDYSHSFFSWPPFLALA LAVITRR VLLSLGIGILVGVAFLV GGNPV DGLTHLKDMV 

10 20" 30 40 50 60 

70 80 90 99 

orf 26 . pep VGLAWSDXDWSLGKPK ILVFXILLGIFTSLLTY SGSNXX 

1 I 1 I I I I I I I ! I i 1 I Ml M II 1 1 I M M 11 I 1 I 
orf 2 6a VGIAWSDGDWS LGKP KXLVFL ILLG I FT SLLT Y SGSNQAFADWAKRHIKN RRGAKMLT AC 

70 80 90 100 110 120 
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or f 2 6. pep 
orf26a 



LVEVrFID DYFHSIAVGAXARPVTDKFKVSRAKLAYILDSTAAPMCVLMP VSSWGASIIA 
130 140 150 160 170 180 



10 



15 



20 



25 



30 



35 



40 



or f 2 6. pep 
orf26a 

or f 2 6. pep 
orf26a 

orf26.pep 
orf26a 

orf26.pep 
orf26a 

or f 2 6. pep 
orf26a 

orf26.pep 
orf26a 



TIAGLLVT YKITEYTPMGTFVAMSLMNYY ALFALIMVFWAWFSFDI GSMARFEOAALNE 
190 200 210 220 230 240 



100 



110 



TS ^ 

I I I I 

AHDETAVS DGSWGRVY ALI I PVLALI ASTVSAM I Y TGAQASET FS I LGAFENT DVNTS LV 
250 260 270 280 290 300 

120 130 140 150 160 170 

FGGTCGV FAVVLCTL GT IKTADYPKAVWQGAKS MFGAI AI L I LAWLI STW GEMHTGDYL 
Ml II I 1:1 M I II I II I I Ml HIM M I HUM II MIIIIIMI II IMIIIIII 
FGGTCGVLAWLCTL GT IKIADYPKAVWQGAKSM FGAIAI LI LAWLI STW GEMHTGDYL 
310 320 330 340 350~ 360 

180 190 200 210 220 230 

STLVAGNIH PGFLPVILFLLASVMAFA TGTS WGTFGIMLPIAAAMAVKV EP ALIIPCMSA 
I I I I M I I I I I I I II M I M M I I II M I I M III Ml III i H I III : |:MIMIM 
STLVAGNIHP GFLXVILFLLASVMAFA TGTS WGTFGIMLPIAAAMAVKV DPSLIIPCMSA 
370 380 



390 



400 



410 



420 



240 250 260 270 280 290 

VMAGAVCG DHCSPISDTTILSSTGARCNHIDHVTSQLP YALTVAAAAASGYLALGL TKSA 

I IN" II I M MM III II I II 1 1 Ml | t I I I I I I I I I I I 1 1 | | | S 1 1 I I 1 1 1 1 

VMAGAVCG DHCSPISDTTILSSTGARCNHIDHVTSQLP YALTVAAAAASGYLALGL TKSA 
430 440 450 460 470 480 

300 310 
LLGFGTTGI VLAVLI FL LKDKK 
M MIM Ml I III I Mil Ml 
LLGFGXTGI VLAVLI FL LKDKKRANAX 
490 500 



The complete length ORF26a nucleotide sequence <SEQ ID 693> is: 



45 



50 



55 



60 



65 



70 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 



ATGCAGCTGA 
TTTGGCACTG 
GCATCGGTAT 
GACGGTCTGA 
CGGCGATTGG 
TGGGTATTTT 
GCCGACTGGG 
GACCGCCTGC 
TCGCCGTCGG 
CGCGCCAAAC 
GCTGATGCCC 
GACTGCTCGT 
GTCGCCATGA 
GTTCGTCGTC 
AACAAGCCGC 
AGCTGGGGCA 
CTCAACGGTT 
TCAGCATTTT 
TTCGGCGGCA 
GATTAAAATC 
TGTTCGGCGC 
GTCGGCGAAA 
CATCCATCCC 
TGGCGTTTGC 
ATTGCCGCCG 
TATGTCCGCC 
TTTCCGACAC 



TCGACTATTC 
GCACTTGCCG 
TCTGGTCGGC 
CACACCTGAA 
TCGCTGGGCA 
TACTTCCCTG 
CAAAACGGCA 
CTCGTGTTCG 
TGCGNTTGCC 
TCGCCTACAT 
GTTTCAAGCT 
TACCTACAAA 
GCCTGATGAA 
GCATGGTTCT 
GTTGAACGAA 
GGGTTTACGC 
TCCGCCATGA 
GGGTGCATTT 
CTTGCGGCGT 
GCCGATTATC 
AATCGCCATT 
TGCACACAGG 
GGCTTCCTGN 
CACAGGCACA 
CCATGGCGGT 
GTGATGGCGG 
GACCATCCTG 



ACATTCATTT 
TCATTACCCG 
GTTGCCTTTT 
AGACATGGTC 
AACCAAAANT 
CTGACCTACT 
CATTAAAAAC 
TAACCTTTAT 
CGCCCCGTTA 
CCTCGACTCC 
GGGGCGCGTC 
ATCACCGAAT 
CTATTACGCA 
CCTTCGACAT 
GCCCACGATG 
ATTGATTATT 
TCTACACCGG 
GAAAATACGG 
GCTTGCCGTC 
CCAAAGCCGT 
TTAATCCTTG 
CGACTACCTC 
CCGTCATCCT 
AGCTGGGGGA 
CAAAGTCGAT 
GGGCGGTATG 
TCGTCCACCG 



TTCTCGGTTG 
CCGCGTACTG 
TGGTCGGCGG 
GTCGGCTTGG 
CTTGGTTTTC 
CCGGCAGCAA 
CGGCGCGGCG 
CGACGACTAT 
CCGACAAGTT 
ACTGCCGCGC 
GATTATCGCC 
ACACGCCGAT 
CTGTTTGCCC 
CGGCTCGATG 
AAACTGCCGT 
CCCGTTTTGG 
TGCACAGGCA 
ACGTGAACAC 
GTCCTCTGCA 
TTGGCAGGGT 
CCTGGCTCAT 
TCCACGCTGG 
TTTCCTGCTC 
CGTTCGGCAT 
CCCTCACTGA 
CGGCGACCAC 
GCGCGCGCTG 



TGCCACCCTT 
CTGTCTTTAG 
CAACCCCGTC 
CTTGGTCAGA 
CTGATACTTT 
TCAGGCGTTT 
CGAAAATGCT 
TTCCACAGTC 
TAAAGTTTCC 
CTATGTGCGT 
ACGCTTGCCG 
GGGGACGTTT 
TGATTATGGT 
GCACGTTTCG 
TTCAGACGGC 
CCTTAATCGC 
AGCGAAACCT 
TTCGCTGGTA 
CGCTCGGCAC 
GCGAAATCCA 
CAGTACGGTT 
TTGCGGGCAA 
GCCAGCGTGA 
CATGCTGCCG 
TTATCCCGTG 
TGCTCGCCCA 
CAACCACATC 
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1351 GACCACGTTA CNTCGCAACT GCCTTACGCC TTAACCGTTG CCGCCGCCGC 

1401 CGCATCGGGN TACCTCGCAT TGGGTCTGAC AAAATCCGCG CTGTTGGGTT 

1451 TTGGCANGAC AGGCATTGTA TTGGCGGTGC TGATTTTTCT GTTGAAAGAT 

1501 AAAAAACGCG CCAACGCCTG A 

This encodes a protein having amino acid sequence <SEQ ID 694>: 

1 MOL IDYSHSF FSWPPFLAL A LAVITR RVL LSLGIGILVG VAFLVGGNPV 

51 n^ TlTH T r^ TOTAWsnnnw SLGKP KXLVF LILLGIFTSL LTY SGSNQAF 

101 ADWAKRHIKN RRGAKMLTAC LVFVTFID DY FHSLAVGAXA RPVTDKFKVS 

151 RAKLAYILDS TAAPMCVLMP VSSWGASIIA TLAGLLVT YK ITEYTPMGTF 

201 VAMSLMNYYA LFALIMVFW AWFSFDI GSM ARFEQAALNE AHDETAVSDG 

251 SWGRVY ALII PVIALIASTV SAMI YTGAQA SETFSILGAF ENTDVNTSLV 

301 FGG TCGVLAV VLCTL GTIKI ADYPKAVWQG AKSMFGAIAI LILAWLISTV 

351 VGEMHTGDYL STLVAGNIHP GFLXVILFLL ASVMAFAT GT SWGTFGIMLP 

401 IAAAMAVKV D P SLIIPCMSA VMAGAVCG DH CSPISDTTIL SSTGARCNHI 

451 DHVTSQLPY A LTVAAAAASG YLALGL TKSA LLGFGXTGIV LAVLIFLLKD 

501 KKRANA* 

ORF26a and ORF26-1 show 97.8% identity in 506 aa overlap: 



or f 2 6a. pep 
orf26-l 

or f 2 6a. pep 
orf26-l 

or f 26a. pep 
orf26-l 

or f 2 6a. pep 
orf26-l 

orf26a.pep 
orf26-l 

orf26a.pep 
orf26-l 

or f 2 6a. pep 
orf26-l 

orf26a.pep 
orf26-l 



10 20 30 40 50 60 

MQLIDYSHSFFSWPPFLALALAVITRRVLLSLGIGILVGVAFLVGGNPVDGLTHLKDMV 

| 1 1 1 1 1 1 1 1 1 1 I I 1 1 1 1 1 1 i M 1 1 1 1 1 1 M 1 1 M M I M 1 1 II I K I I i I 1 1 
MQLIDYSHSFFSWPPFLALALAVITRRVLLSLGIGILVGVAFLVGGNPVDGLTHLKDMV 

10 20 30 40 50 60 

70 80 90 100 110 120 

VGIAWSDGDWSLGKPKXLVFLILLGIFTSLLTYSGSNQAFADWAKRHIKNRRGAKMLTAC 

in inn mm mi iimiiiiiiiimiiiiiMiiiiiiMMHMiMjH 

VGLAWSDGDWSLGKPKILVFLILLGIFTSLLTYSGSNQAFADWAKRHIKNRRGAKMLTAC 
70 80 90 100 110 120 

130 140 150 160 170 180 

LVFVTFI DDYFHSLAVGAXARPVT DKFKVSRAKLAYILDSTAAPMCVLMPVS SWGAS 1 1 A 

MMMMM! MMM I I 1 1 II I I I I I 1 : 1 1 1 1 M I I III I I I I M I M II I 1 1 I M 
LVFVTFI DDYFHSLAVGAIARPVTDKFKVSRTKLAYILDSTAAPMCVIjMPVSSWGASIIA 

130 140 150 160 170 180 

190 200 210 220 230 240 

TLAGLLVTYKI TE YTPMGT FVAMSLMN Y YALFALIMVFWAWF S FDI G SMARFEQAALNE 

II Mill MM HMIIIIIIIIIIIMIIMIIIMIIIIIIIMIIIIMIII 

TLAGLLVTYKITE YTPMGT FVAMS LMN YYALFAL IMVFWAWFS FDIG SMARFEQAALNE 

190 200 210 220 230 240 

250 260 270 280 290 300 

AHDET AVS DGSWGRVYALI I PVLALI ASTVSAMI YTGAQASET FS I LGAFENTDVNT SLV 
MMIIMI:: M I I I I I I I M I I M I I N I i I I I I I M I I I i I I I M M I I I I I I I I I 
AHDET AVS DATKGRVYALI I PVLALIASTVSAMI YTGAQASET FS I LGAFENT DVNT SLV 

250 260 270 280 290 300 

310 320 330 340 350 360 

FGGTCGVLAWLCTLGT IKIADYPKAVWQGAKSMFGAIAILILAWLI STWGEMHTGDYL 
i I i I I 1 M I M t M I I M 1 IIIIIIMIIIIIIlllllllllllMMMItllMMI 
FGGTCGVLAWLCTLGT IKTADYPKAVWQGAKSMFGAIAI LILAWLI STWGEMHTGDYL 

310 320 330 340 350 360 

370 380 390 400 410 420 

STLVAGNIHPGE*LXVILFLLASVMAFATGTSWGTFGIMLPIAAAMAVKVDPSLIIPCMSA 

It i | | | || 1 1 | | | || M I I I M I I I I I M M I 1 1 M I I M I I I U M I : I : I M 11 I I I 
STLVAGNIHPGFLPVILFLLASVMAFATGTSWGTFGIMLPIAAAMAVKVEPALIIPCMSA 

370 380 390 400 410 420 

430 440 450 460 470 480 

VMAGAVCGDHC S P I S DTT I LS STGARCNHI DHVTSQL PYALTVAAAAASGYLALGLTKSA 
MMIIIillllllllMIIIMMIllllllilllllllllllMlllllllllliMI 
VMAGAVCGDHCS PIS DTT ILS STGARCNHI DHVT SQLPYALTVAAAAASGYLALGLTKSA 

430 440 450 460 470 480 
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490 500 
LLG FGXTGI VLAVLI FLLKDKKRANAX 
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orf26-l 



| | I I I r I 1 I I I I I I I I 1 I I I I I 1 1 1 I 1 
LLGFGTTGIVLAVLI FLLKDKKRANAX 
490 500 



5 u^nlnpv with * predicted ORF from gonorrhoeae 

ORF26 shows 94.80/c and 99% identity in 97 and 206 aa overlap at the N-terminus and C-terminus, 
respectively, with a predicted ORF (ORF26ng) from AT. gonorrhoeae. 
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orf26.pep 
orf26ng 
orf26.pep 
orf26ng 

orf26.pep 
orf26ng 
or f 2 6. pep 
orf26ng 
or f 2 6. pep 
orf26ng 
orf26.pep 
orf26ng 



VGLAWSDXDWSLGKPKILVFXILLGIFTSLLTYSGSN 9 ; 

// 

TSLVFGGTCGVFAVVLCTLGTIKTADYPKA 
| | I I I I I I II I : I I I I I I : HI I I I I I I I I 
ASTVSAMI YTGAQASET FS ILGAFENT DVNTSLVFGGTCGVLAWLCT FGT IKTADYPKR 

WQGAKSMFGAIAILII^LISTWG^TGDYLSTL^ 
ATGTSWGTFGIMLPI AAAMAVKVEPALI I PCMSAVMAGAVCGDHCS PI S 



326 
326 
386 
386 



446 



446 



502 



5*6 



The complete length ORF26ng nucleotide sequence <SEQ ID 695> is: 



35 



40 



45 



50 



55 
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l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 



ATGCAGCTGA 
TTTGGCACTG 
GCATCGGTAT 
GACGGTCTGA 
CGGCGATTGG 
TGGGCATTTT 
GCCGACTGGG 
GACCGCCTGC 
TCGCCGTCGG 
CGCGCCAAAC 
GCTGATGCCC 
GATTGCTCGT 
GTCGCCATGA 
ATTCGTCGTC 
AACAGGCTGC 
ACCAAAGGTC 
CTCAACGGTT 
TCAGCATTTT 
TTCGGCGGCA 
GATTAAAACC 
TGTTCGGCGC 
GTCGGCGAAA 
CATCCATCCC 
TGGCGTTTGC 
ATTGCCGCCG 
TATGTCCGCA 
TCTCCGACAC 
GACCACGTTA 
CGCATCGGGC 
TTGGCACGAC 
AAAAAACGCG 



TTGACTATTC 
GCACTTGCCG 
TTTGGTCGGC 
CACACCTGAA 
TCGCTGGGCA 
CACTTCACTG 
CAAAACGGCA 
CTCGTGTTCG 
TGCGATTGCC 
TCGCCTACAT 
GTTTCAAGCT 
TACCTACAAA 
GCCTGATGAA 
GCATGGTTCT 
GTTGAACGAA 
GTGTTTACGC 
TCCGCCATGA 
GGGGGCATTT 
CTTGCGGCGT 
GCCGATTATC 
AATCGCCATT 
TGCACACGGG 
GGCTTCCTGC 
CACAGGCACA 
CCATGGCGGT 
GTAATGGCGG 
GACCATCCTG 
CCTCGCAACT 
TACCTCGCAT 
CGGTATTGTA 
CCGACGTTTG 



ACATTCATTT 
TCATTACCCG 
GTTGCCTTTT 
AGACATGGTC 
AACCAAAAAT 
CTGACCTACT 
CATTAAAAAC 
TAACCTTTAT 
CGCCCCGTTA 
CCTCGACTCC 
GGGGCGCGTC 
ATTACCGAAT 
CTATTACGCG 
CCTTCGACAT 
gcccaggacg 
ATTGATTATT 
TCTACACCGG 
GAAAATACCG 
GCTTGCCGTC 
CCAAAGCCGT 
TTAATCCTCG 
CGACTACCTC 
CCGTCATCCT 
AGCTGGGGGA 
CAAAGTCGAA 
GGGCGGTATG 
TCGTCCACCG 
GCCTTATGCC 
TGGGTCTGAC 
TTGGCGGTGC 



TTCTCGGTTG 
CCGCGTACTG 
TGGTCGGCGG 
GTCGGCTTGG 
CtTGGTTTTC 
CCGGCAGCAA 
CGGTGCGGCG 
CGACGACTAT 
CCGACAAGTT 
ACTGCCTCGC 
GATTATCGCC 
ACACGCCGAT 
CTGTTTGCCC 
CGGCTCGAtg 
aaaccgccgc 
CCCGTTTTGG 
CGCGCAGGCA 
ACGTAAACAC 
GTCCTCTGCA 
GTGGCAGGGT 
CCTGGCTCAT 
TCCACGCTGG 
CTTCCTGCTC 
CGTTCGGCAT 
CCCGCGCTGA 
CGGCGACCAC 
GCGCGCGCTG 
CTGACGGTTG 
AAAATCCGCG 
TGATTTTTCT 



TGCCACCCTT 
CTGTCTTTAG 
CAACCCCGTC 
CTTGGGCAGA 
CTGATACTTT 
TCAGGCGTTT 
CGAAAATGCT 
TTCCACAGCC 
TAAAGTTTCC 
CCATGTGCGT 
ACGCTTGCCG 
GGGGACGTTT 
TGATTATGGT 
gCGCGTTTCG 
tTCAGACgCT 
CCTTAATCGC 
AGCGAAACCT 
TTCGCTGGTA 
CGTTCGGCAC 
GCGAAATCCA 
CAGTACGGTT 
TTGCGGGCAA 
GCCAGCGTGA 
TATGCTGCCG 
TTAtCCCGTG 
TGTTCGCCCA 
CAACCACATC 
CCGCCGCCGC 
CTGTTGGGCT 
GTTGAAAGAT 



This encodes a protein having amino acid sequence <SEQ ID 696>: 
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10 



1 MOLIDYSHSF FSWPPFLM ALAVITRRVL T.ST.GTGILVG VAFLVGGNPV 
M KSIrTH" llaro^ wt/»KP KTTjVF LILLGIFTSL LTY SUSNQAF 
im S££55 "ac LVFVTFID DY FHSLAVGAIA RPVTDKFKVS 

\%\ Kayilds * "£S™*p vssw gasiia tlagllvtyk iteytpmgtf 

111 5S™v™ T.FRTiIMVFW AWFSFDIG SM ARFEQAALNE AQDETAASDA 
111 SSvySSi PVL ALIASTV SAMI Y TGAQA SETFSILGAF ENTDVNTSJg 

ffgnrffi? j^jS lHP CTTPVUFLI ASVMAFA TGT SWGTFgl^P 

40l SSt^pSa vmagavcg dh cspisdttii. sstgarcnhi 

ill ^H^ A T .TVAAAAASG YLALGLT K SA T.LGFGTTGIV LAVLIFLL KD 
501 KKRADV* 

OKF26ng and ORF26-1 show 98.4% identity in 505 aa overlap: 
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orf 2 6-1. pep 
orf26ng 

or f 2 6-1. pep 
orf26ng 

or f 2 6-1. pep 
orf26ng 

orf26-l.pep 
orf26ng 

orf26-l.pep 
orf26ng 

orf 26-1. pep 
orf26ng 

orf26-l.pep 
orf26ng 

orf 2 6-1. pep 
orf26ng 

orf 2 6-1. pep 
orf26ng 



10 20 30 40 50 60 



100 



110 



120 



VGLAWSDGWSLGKPKILVFLILLGIFTSLLTYSGOTQAF^ 

I I I I I • I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I II I Ml 
VGLAWADGDWSLGKPKILVFLILLGIFTSLLTYSGSNQAFADW 

80 90 ioo no 



70 



120 



170 



180 



130 140 150 160 

LVFVTFIDDYFHSIAVGAIARPVTDKFKVSRTK^ 

. , , 1 1 1 1 I i i I i | | 1 1 1 1 I 1 1 1 1 1 1 1 1 I 1 1 1 : 1 1 : 1 I I I I I I 1 1 1 1 1 1 1 1 1 1 

130 14 o 150 160 170 ±vv 



220 



230 



240 



TIAGLLvJSlTEYTPM^ 
n l ! I I I I I 1 1 I I I I I I I I I IN 1 1 1 1 1 M I I I I I I I M M H t I I M I M 1 1 I I I I I N 

190 200 210 220 230 



280 



290 



300 



250 260 270 

AHDETAVSDATKGRVYALIIPVLALIASTVSAMIYTGAQASETFSILGAFENTO 

I . I I I I . I I I I I I 1 I I I I I I | | | | | | I I I H M I I I I I I I I I I I II I I I I I I I I I I I J 

AQDETAASDATKGRWALIIPV^ 

250 260 270 ? ft0 290 



280 



290 



350 



360 



310 320 330 340 

FGGT CGVLAWLCTLGT IKTADY PKAVWQGAKSMFGAIM LI ^WLI ^ YY??^T??T^ 1 
I niii 11111111:11 I 1 1 II II II Ml II 1 1 II III NIMH III I Mill II MM 

310 320 330 1411 - bu JbU 



340 
400 



410 



420 



370 380 390 

STLVAGN IHPGFLPVI LFLLASVMAFATGTSWGT FGIMLPI AAAMAVKVE PALI I PCMSA 
MM II 1 1 1 1 I I I 1 1 M 1 1 I M I M II I I I M M M I I M I M 1 1 M I II M II 1 1 I Ml 

370 380 390 400 410 



470 



480 



430 440 450 460 

VMAGAVCGDHCS PI S DTTIIiS STGARCNH I ^^^^^^TY^^^? < f T^^^T^?^ 
i i I I M I | I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I 

430 440 450 460 470 4bU 

490 500 
LLGFGTTGI VLAVLI FLLKDKKRANAX 
1 1 1 1 It II M M II M I M I M I M : 
LLGE*GTTGIVLAVLI FLLKDKKRADVX 
490 500 

In addition, ORF26 ng shows significant homology to a hypothetical H.influenzae protein: 
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sp|P44263|YF86_HAEIN HYPOTHETICAL PROTEIN HI1586 >gi 1 1074850 Ipir I IC64037 
hypothetical 

protein HI1586 - Haemophilus influenzae (strain Rd KW20) >gi 1 1574427 (U32832) H. 
influenzae predicted coding region HI1586 [Haemophilus influenzae] Length =» 519 
Score = 538 bits (1370), Expect =* e-152 

Identities » 280/507 (55%), Positives - 346/507 (68%), Gaps = 7/507 (1%) 



10 



15 



20 



25 



30 



35 



40 



Query: 1 MQLIDYSHSFFSWPPFLALALAVITRRXXXXXXXXXXXXXAFLVGGNPVDGLTHLKDMV 60 

M+LID+S S +S+VP LA+ LA+ TRR L +L V 

Sbjct: 14 MELIDFSSSVWSIVPALLAIILAIATRRVLVSLSAGIIIGSLMLSDWQIGSAFNYLVKNV 73 

Query: 61 VGLAWADGDW SLGKPKI LV FLI LLG I FTS LLT YSGSNQAFADWAKRHI KNRCGAKMLTAC 120 

V L +ADG+ + I++FL+LLG+ T+LLT SGSN+AFA+WA+ IK R GAK+L A 

Sbjct: 74 VSLVTADGEIN-SNMNIVLFLLLLGVLTALLTVSGSNRAFAEWAQSRIKGRRGAKLLAAS 132 

Query: 121 LVFVTFIDDYFHSIAVGAIARPVTDKFKVSRAKLAYILDSTASW4CVLMPVSSWGASIIA 180 

LVFVTFIDDYFHSLAVGAIARPVTD+FKVSRAKLAYILDSTA+PMCV+MPVSSWGA II 
Sbjct: 133 LVFVTFIDDYFHSLAVGAIARPVTDRFKVSRAKLAYILDSTAAPMCVMMPVSSWGAYIIT 192 

Query: 181 TUVGLLVTYKITEYTPMGTFVAMSIJ^NYYALFALIMVFVVAWFSFDIGSMARFEQAALNE 240 

+ GLL TY ITEYTP+G FVAMS MN+YA+F++IMVF VA+FSFDI SM R E+ AL 
Sbjct: 193 LIGGLLATYSITEYTPIGAFVAMSSMNFYAIFSIIMVFFVAYFSFDIASMVRHEKLALKN 252 

Query: 241 AQDETAASDATKGRVYALI I PVLALIASTVSAMIYTGAQA SETFSILGAFENTDVN 296 

+D+ TKG+V LI+P+L LI +TVS MIYTGA+A + FS+LG FENT V 

Sbjct: 253 TEDQLEEETGTKGQVRNLI LPI LVLI I ATVSMMI YTGAEALAADGKVFSVLGT FENT WG 312 

Query: 297 TSLVFGGTCGVL — AWLCT FGT I KTADY PKAVWQGAKSMFGXXXXXXXXXXXSTWGEM 354 

TSLV GG C ++ +++ + +Y ++ G KSM G + +VG+M 

Sbjct: 313 TSLWGGFCSIIISTLLIILDRQVSVPEYVRSWIVGIKSMSGAIAILFFAWTINKIVGDM 372 

Query: 355 HTGDYLSTLVAGNIHPGFLPVILFLLASVMAFATGTSWGTFGIMLPIAAAMAVKVEPALI 414 

TG YLS+LV+GNI FLPVILF+L + MAF+TGTSWGTFGIMLPIAAAMA P L+ 
Sbjct: 373 QTGKYLSSLVSGNIPMQFLPVILFVLGAAMAFSTGTSWGTFGIMLPIAAAMAANAAPELL 432 

Query: 415 IPCMSAVMAGAVCGDHCSPISDTTILSSTGARCNHIDHVTSQXXXXXXXXXXXXXXXXXX 474 

+PC+SAVMAGAVCGDHCSP+SDTTILSSTGA+CNHIDHVT+Q 
Sbjct: 433 LPCLSAVMAGAVCGDHCSPVSDTTILSSTGAKCNHIDHVTTQLPYAATVATATSIGYIW 4 92 

Query: 475 XXXKSALLGFGTTGIVLAVLIFLLKDK 501 

S L GF T + L V+IF +K + 
Sbjct: 493 GFTYSGLAGFAATAVSLIVI IFAVKKR 519 



Based on this analysis, it is predicted that these proteins from Kmeningitidis and N. gonorrhoeae, 
45 and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 83 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 697>: 

1 . -AAGCAATGGT ATGCCGACGN . AGTATCAAG ACGGAAATGG TTATGGTCAA 

51 CGATGAGCCT GCCAAAATTC TGACTTGGGA TGAAAGCGGC CGATTACTCT 

50 101 CGGAACTGTC TATCCGCCAC CATCAACGCA ACGGGGTGGT TTTGGAGTGG 

151 TATGAAGATG GTTCTAAAAA GAGCGAAGT. GTTTATCAGG ATGACAAGTT 

201 GGTCAGGAAA ACCCAGTGGG ATAAGGATGG TTATTTAATC GAACCCTGA 

This corresponds to the amino acid sequence <SEQ ID 698; ORF27>: 

.1 ..KQWYADXSIK TEMVMVNDEP AKILTWDESG RLLSELSIRH HQRNGWLEW 
55 51 YEDGSKKSEX VYQDDKLVRK TQWDKDGYLI EP* 

Further work revealed the complete nucleotide sequence <SEQ ID 699>: 

1 ATGAAAAAAT TATCTCGGAT TGTATTTTCA ACTGTCCTGT TGGGTTTTTC 

51 GGCCGCTTTG CCGGCGCAGA CCTATTCTGT TTATTTTAAT CAGAACGGAA 

101 AGCTGACGGC GACGATGTCT TCTGCCGCTT ATATCAGGCA ATATAGTGTG 

60 151 GTGGCGGGTA TTGCGCACGC GCAGGATTTT TATTATCCGT CGATGAAGAA 
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201 ATATTCTGAA CCTTATATCG TTGCTTCAAC GCAAATCAAA TCTTTTGTGC 

251 CTACCCTGCA AAACGGTATG TTGATTTTGT GGCATTTTAA TGGTCAGAAA 

301 AAAATGGCGG GGGGCTTCAG CAAGGGTAAG CCGGACGGGG AGTGGGTCAA 

351 CTGGTATCCG AACGGTAAAA AATCTGCCGT TATGCCTTAT AAAAATGGCT 

401 TGAGTGAGGG TACGGGATAC CGCTATTACC GTAACGGCGG CAAGGAAAGC 

451 GAAATCCAGT TTAAGCAAAA TAAGGCAAAC GGCGTATGGA AGCAATGGTA 

501 TGCCGACGGC AGTATCAAGA CGGAAATGGT TATGGTCAAC GATGAGCCTG 

551 CCAAAATTCT GACTTGGGAT GAAAGCGGCC GATTACTCTC GGAACTGTCT 

601 ATCCGCCACC ATCAACGCAA CGGGGTGGTT TTGGAGTGGT ATGAAGATGG 

651 TTCTAAAAAG AGCGAAGCTG TTTATCAGGA TGACAAGTTG GTCAGGAAAA 

701 CCCAGTGGGA TAAGGATGGT TATTTAATCG AACCCTGA 

This corresponds to the amino acid sequence <SEQ ID 700; ORF27-l>: 

1 MKKLSRIVFS TVLLGFSAAL PAQTYSVYFN QNGKLTATMS SAAYIRQYSV 

51 VAGIAHA QDF YYPSMKKYSE PYIVASTQIK SFVPTLQNGM LILWHFNGQK 

101 KMAGGFSKGK PDGEWVNWYP NGKKSAVMPY KNGLSEGTGY RYYRNGGKES 

151 EIQFKQNKAN GVWKQWYADG SIKTEMVMVN DEPAKILTWD ESGRLLSELS 

201 IRHHQRNGW LEWYEDGSKK SEAVYQDDKL VRKTQWDKDG YLIEP* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF27 shows 91.5% identity over a 82aa overlap with an ORF (ORF27a) from strain A of N. 
meningitidis: 

10 20 30 

orf27 pep KQWYADX S I KTEMVMVNDE PAKI LTWDESG 

III II I :l II Ml I II II I Mil I III I I 
orf27a LSEGTGXRYYRNGGKESEIQFKQNKANGVWKQWYADGNIKTEMVMVNDEPAKILTWDESG 
140 150 160 170 180 190 



40 50 60 70 80 

orf 27 . pep RLLSELS IRHHQRNGWLEWYEDGSKKSEXVYQDDKLVRKTQWDKDGYLIE PX 
MINIM:! I MM I MM Ml III i MM Ml I MM II Ml I! Ill 
orf 27a RLLSELSIHHHXRNGWLEWYEDGSKKXEAVYQDDKLVRKTQWDXDGYLIEPX 
200 210 220 230 240 

The complete length ORF27a nucleotide sequence <SEQ ID 701 > is: 

1 ATGAAAAAAT TATCTCGGAT TGTATTTTCA ACTGTCCTGT TGGGTTTTTC 

51 GGCCGCTTTG CCGGCGCAGA NCTATTCTGT TTATTTTAAT CAGAACGGGA 

101 AACTGACGGC GACGNTGTCT TCTGCCGCNT ATATCAGGCA ATATAGTGTG 

151 GCGGAGGGTA TTGCGCACGC GCAGGANTTT TANTATCCGT CGATGAAGAA 

201 ATATTCCGAA CCTTATATCG TTGCTTCAAC GCAAATCAAA TCTTTTGTGC 

251 CTACCCTGCA AAACGGTATG TTGATTTTGT GGCATTTTAA NGGTCAGAAA 

301 AAAATGGCNG GGGGCTTCAG CAAGGGTAAG CCGGACGGGG AGTGGGTCAA 

351 CTGGTATCCG AACGGTAAAA AATCTGCCGT TATGCCTTAT AAAAATGGTT 

401 TGAGTGAAGG TACGGGGTNN CGCTATTACC GTAACGGCGG CAAGGAAAGC 

451 GAAATCCAGT TTAAACAGAA TAAGGCAAAC GGCGTATGGA AGCAATGGTA 

501 TGCCGACGGC AATATCAAAA CGGAAATGGT TATGGTCAAT GATGAGCCTG 

551 CCAAAATTCT GACATGGGAT GAAAGCGGTC GATTACTCTC GGAACTGTCT 

601 ATCCATCATC ATNAACGTAA TGGAGTAGTC TTAGAGTGGT ATGAAGATGG 

651 TTCTAAAAAG ANTGAAGCTG TTTATCAGGA TGATAAGTTG GTCAGGAAAA 

701 CCCAGTGGGA TAANGATGGT TATTTAATCG AACCCTGA 

This encodes a protein having amino acid sequence <SEQ ID 702>: 



1 MKKLSRIVFS TVLLGFSAAL PAQXYSVYFN QNGKLTATXS SAAYIRQYSV 

51 AEGIAHA QXF XYPSMKKYSE PYIVASTQIK SFVPTLQNGM LILWHFXGQK 

101 KMAGGFSKGK PDGEWVNWYP NGKKSAVMPY KNGLSEGTGX RYYRNGGKES 

151 EIQFKQNKAN GVWKQWYADG NIKTEMVMVN DEPAKILTWD ESGRLLSELS 

201 IHHHXRNGW LEWYEDGSKK XEAVYQDDKL VRKTQWDXDG YLIEP* 

ORF27a and ORF27-1 show 94.7% identity in 245 aa overlap: 

10 20 30 40 50 60 

orf 27 a . pep MKKLSRIVFSTVLLGFSAALPAQXYSVYFNQNGKLTATXSSAAYIRQYSVAEGIAHAQXF 
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1 1 MM III MM I III III 111:11 Ml II Mil II I MM lit till: II MM I 
orf27-l MKKLSRIVFSTVLLGFSAALPAQTYSVYFNQNGKLTATMSSAAYIRQYSVVAGIAHAQDF 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 27a. pep XYPSMKKYSEPYIVASTQIKSFVPTLQNGMLILWHFXGQKKMAGGFSKGKPDGEWVNWYP 
Mill I I Mill Mil Ml | Ml || Mill Mill I t I I I 1 I I I 1 1 I I f I 1 I J I I I t I 
orf 27-1 YYPSMKKYSEPYIVASTQIKSFVPTLQNGMLILWHTOGQKKMAGGFSKGKPDGEWVNWYP 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 27a . pep NGKKSAVMPYKNGLSEGTGXRYYRNGGKESEIQFKQNKANGVWKQWYADGNIKTEMVMVN 
I M 1 1 M I M I I II II I I I I I I 1 I I I I I M 1 1 K ! t 1 1 r M 1 1 I I I I ! I I c 1 1 K I K I I I I 
o r f 2 7 - 1 NGKKSAVMP YKNGLSEGTGYRY YRNGGKESE IQFKQNKANGVWKQWYADG S IKTEMVMVN 

130 140 150 160 170 180 

190 200 210 220 230 240 

or f 27a . pep DEPAKILTWDESGRLLSELSIHHHXRNGWLEWYEDGSKKXEAVYQDDKLVRKTQWDXDG 
I I Ml MM MM II III II I: II M I I I ! I I 1 I I I I I I II I I I t I I I i t I I I I I M 
orf 27-1 DEPAKILTWDESGRLLSELSIRHHQRNGWLEWYEDGSKKSEAVYQDDKLVRKTQWDKDG 

190 200 210 220 230 240 



orf 27a. pep 
orf27-l 



YLIEPX 
MIMI 
YLIEPX 



Homology with a predicted ORF from Keonorrhoeae 

ORF27 shows 96.3% identity over 82 aa overlap with a predicted ORF (ORF27ng) frop 
N. gonorrhoeae: | 

orf 27. pep KQWYADXSIKTEMVMVNDEPAKILTWDESG 30 

M II II I II I II II II I II I I I I I M II I 
orf27ng LSEGTGYRYYRNGGKESEIQFKQNKANGVWKQWYADGSIBCTEMVMVNDEPAKILTWDESG 193 

orf 27 .pep RLL SELS I RHHQRNG WLEWYEDGSKKS EXVYQD DKLVRKTQWDKDG YL IE P 82 

M M II I II I I : II II II II II II I II I I II II M II II M M II I I I II I 
orf27ng RLLSELSIRHHKRNGWLEWYEDGSKKSEAVYQDDKLVRKTQWDKDGYLIEP 245 

The complete length ORF27ng nucleotide sequence <SEQ ED 703> is: 



1 ATGAAGAAAT 

51 GGCCGCTTTG 

101 AACTGACGGC 

151 GCGGCGGGTA 

201 ATATTCCGAA 

251 CTACCCTGCA 

301 AAAATGGCGG 

351 CTGGTATCCG 

401 TGAGTGAGGG 

451 GAAATCCAGT 

501 TGCCGATGGA 

551 CCAAAATTCT 

601 ATCCGCCACC 

651 TTCTAAAAAG 

701 CCCAATGGGA 



TATCTCGGAT 
CCGGCGCAGA 
GACGATGTCT 
TCGCACACGC 
CCTTATATCG 
AAACGGTATG 
GGGGCTTCAG 
AACGGTAAAA 
TACGGGATAC 
TTAAGCAAAA 
AGTATCAAGA 
GACTTGGGAT 
ATAAACGCAA 
AGCGAGGCTG 
TAAGGATGGT 



TGTATTTTCA 
CCTATTCTGT 
TCTGCCGCTT 
GCAGGATTTT 
TTGCTTCAAC 
TTGATTTTGT 
CAAGGGTAAG 
AATCTGCGGT 
CGTTATTACC 
TAAGGCGAAC 
CGGAAATGGT 
GAAAGCGGCC 
CGGGGTGGTT 
TTTATCAGGA 
TATTTAATCG 



ATCGTACTGT 
TTATTTTAAT 
ATATCAGGCA 
TATTATCCGT 
GCAAATCAAA 
GGCATTTTAA 
CCGGACGGGG 
TATGCCTTAT 
GTAACGGCGG 
GGCGTATGGA 
TATGGTCAAC 
GATTACTTTC 
TTGGAGTGGT 
TGACAAGTTG 
AACCCTGA 



TGGGTTTTTC 
CAGAACGGGA 
ATATAGTGTG 
CGATGAAGAA 
TCTTTTGTGC 
TGGTCAGAAA 
AATGGGTCAA 
AAAAATGGCT 
CAAGGAAAGC 
AGCAATGGTA 
GATGAGCCTG 
GGAACTGTCT 
ATGAAGATGG 
GTCAGGAAAA 



This encodes a protein having amino acid sequence <SEQ ID 704>: 



1 MKKLSRIVFS IVLLGFSAAL PAQTYSVYFN QNGKLTATMS 

51 AAGIAHAQDF YYPSMKKYSE PYIVASTQIK SFVPTLQNGM 

101 KMAGGFSKGK PDGEWVNWYP NGKKSAVMPY KNGLSEGTGY 

151 EIQFKQNKAN GVWKQWYADG S IKTEMVMVN DEPAKILTWD 

201 IRHHKRNGW LEWYEDGSKK SEAVYQDDKL VRKTQWDKDG 

ORF27ng and ORF27-1 show 98.8% identity in 245 aa overlap: 



SAAYIRQYSV 
LILWHFNGQK 
RYYRNGGKES 
ESGRLLSELS 
YLIEP* 



10 20 30 40 50 60 

orf 27-1 . pep MKKLSRIVFSTVLLGFSAALPAQTYSVYFNQNGKLTATMSSAAYIRQYSWAGIAHAQDF 



10 



15 
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Ml I I! I I II Mil II MM II III I II II UNI M MMNIN llhll mm I 
*rf27na MKKLSRIVFSIVLLGFSAALPAQTYSVYFNQNGKLTATMSSAAYIRQYSVAAGIAHAQDF 
° rr g io 20 30 40 50 60 
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orf27ng 



70 80 90 100 110 120 

- 1 neo YYPSMKKYSEPYIVASTQIKSFVPTLQNGMLILWHFNGQKKMAGGFSKGKPDGEWVNWYP 
orf27 l.pep | | | || II I M II M I M II I 1 1 1 1 II I II 1 1 M I M I M M M II 1 1 1 11 II M 1 1 1 1 I I 
YYPSMKKYSEPYIVASTQIKSFVPTLQNGMLILWHFNGQKKMAGGFSKGKPDGEWVNWYP 
70 80 90 100 110 120 

130 140 150 160 170 180 

nr£27-l vev NGKKS AVMPYKNGLSEGTGYRYYRNGGKESE IQFKQNKANGVWKQW YADGS IKTEMVMVN 
orf 27 l .pep 1 1 1 1 1 1 II 1 1 1 1 II II M 1 1 Ml M II I II I M II I II I Ml mi Mill 

Q rf27na NGKKSAVMPYKNGLSEGTGYRYYRNGGKESEIQFKQNKANGVWKQWYADGS IKTEMVMVN 

° r 9 130 140 150 160 170 180 



190 200 210 220 230 240 

orf 27-1 Pep DE P AKI LTWDE SGRLLSELS I RHHQRNGWLEWYEDGSKKSEAVYQDDKLVRKTQWDKDG 
° P M I M M M M II 1 1 I M I M M I : I 1 I I I i M 1 1 I I II I M II II I M I llllllim 

90 orf27na DEPAKILTWDESGRLLSELSIRHHKRNGWLEWYEDGSKKSEAVYQDDKLVRKTQWDKDG 

ori^/ng wo 20Q 210 22 0 230 240 

orf 27 -l.pep YLIEPX 
25 I I I I I I 

orf27ng YLIEPX 

Based on this analysis, including the putative leader sequence in the gonococcal protein, it was 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
30 useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF27-1 (24.5kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
17A shows the results of affinity purification of the GST-fusion protein, and Figure 17B shows the 
results of expression of the His-fusion in E.coli. Purified GST-fusion protein was used to immunise 
35 mice, whose sera were used for ELISA, which gave a positive result, confirming that ORF27-1 is 
a surface-exposed protein and a useful immunogen. 

Example 84 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 705>: 

1 ATGAAATTTA CCAAGCACCC CGTCTGGGCA ATGGCGTTCC GCCCATTTTA 

40 51 TTCGCTGGCG GCTCTGTACG GCGCATTGTC CGTATTGCTG TGGGGTTTCG 

101 GCTACACGGG AACGCACkAG CTGTCCGGTT TCTATTGGCA CGCGCATGAg 

151 ATGATTTGGG GTTATGCCGG ACTGGTCGTC ATCGCCTTCC TGCTGACCGC 

201 CGTCGCCACT TGGACGGGGC AGCCGCCCAC GCGGGGCGGC GTaTCTGGTC 

251 GGCTTGACTA TCTTTTGGCT GGCTGCGCGG ATTGCCGCCT TTATCCCGGG 

45 301 TTGGGGTGCG TCGGCAAGCG GCATACTCGG TACGCTGTTT TTCTGGTACG 

351 GCGCGGTGTG CATGGCTTTG CCCGTTATCC GTTCGCAGAA TCAACGCAAC 

401 TATGTTgCCG TGTTCGCGCT GTTCGTCTTG GGCGGCACGC ATGCGGCGTT 

451 CCACGTCCAG CTGCACAACG GCAACCTAGG CGGACTCTTG AGCGGATTGC 

501 AGTCGGGCTT GGTGATG 

50 This corresponds to the amino acid sequence <SEQ ID 706; ORF47>: 

! MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHX LSGFYWHAHE 
51 MIWGYAGLW IAFLLTAVAT WTGQPPTRGG VLVGLTIFWL AARIAAFIPG 
101 WGASASGILG TLFFWYGAVC MALPVIRSQN QRNYVAVFAL FVLGGTHAAF 
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151 HVQLHNGNLG GLLSGLQSGL VM 

Further work revealed the complete nucleotide sequence <SEQ ID 707>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 



ATGAAATTTA 
TTCGCTGGCG 
GCTACACGGG 
ATGATTTGGG 
CGTCGCCACT 
GCTTGACTAT 
TGGGGTGCGT 
CGCGGTGTGC 
ATGTTGCCGT 
CACGTCCAGC 
GTCGGGCTTG 
TTATTTCGTT 
CCGAAATGGG 
GCTGATGGCG 
CGGCAGGTGT 
GTGTTGAAAG 
CGGATTGGGG 
TCAATCTGGG 
TTGGGCATGA 
TCCGCCGCCC 
CCGCCGTCCG 
AGCATCCGCA 
GTGGAAGTAT 
GTTGA 



CCAAGCACCC 
GCTCTGTACG 
AACGCACGAG 
GTTATGCCGG 
TGGACGGGGC 
CTTTTGGCTG 
CGGCAAGCGG 
ATGGCTTTGC 
GTTCGCGCTG 
TGCACAACGG 
GTGATGGTGT 
TTTTACGTCC 
TGGCGCAGGC 
CACGGTGTGT 
GATTTTTACC 
AGCCGATGCT 
CTGATTGCGG 
TGTGCATCTG 
TGGCGCGTAC 
AAAGCCGTTC 
TATGGTTGCC 
CCTCTTCGGT 
ATTCCTTGGC 



CGTCTGGGCA 
GCGCATTGTC 
CTGTCCGGTT 
ACTGGTCGTC 
AGCCGCCCAC 
GCTGCGCGGA 
CATACTCGGT 
CCGTTATCCG 
TTCGTCTTGG 
CAACCTAGGC 
CGGGTTTTAT 
AAACGCTTGA 
TTCGCTGTGG 
TGGCTTGGCT 
GTGCAGGTGT 
GTGGATTCTG 
TCGGCGCGTC 
ATCGGGGTCG 
CGCGCTTGGT 
CCGTTGCGTT 
GTATTTTCTT 
TTTGTTTGCA 
TGATTCGTCC 



ATGGCGTTCC 
CGTATTGCTG 
TCTATTGGCA 
ATCGCCTTCC 
GCGGGGCGGC 
TTGCCGCCTT 
ACGCTGTTTT 
TTCGCAGAAT 
GCGGCACGCA 
GGACTCTTGA 
CGGTCTGATT 
ATGTGCCGCA 
CTGCCCATGC 
GTCTGCCGTT 
ACCGCTGGTG 
TTTGCCGGCT 
TTATTTCAAA 
GCGGTATCGG 
CATACGGGCA 
TTGGCTGATG 
CCGGCACTGC 
CTCGCGCTTT 
GCGTTCGGAC 



GCCCATTTTA 
TGGGGTTTCG 
CGCGCATGAG 
TGCTGACCGC 
GTTCTGGTCG 
TATCCCGGGT 
TCTGGTACGG 
CAACGCAACT 
TGCGGCGTTC 
GCGGATTGCA 
GGTACGCGGA 
GATTCCCAGT 
TGACTGCCAT 
TTTGCCTTTG 
GTATAAACCC 
ATCTGTTTAC 
CCCGCTTTCC 
CGTGCTGACT 
ATCCGATTTA 
ATGGCGGCAA 
CTACACGCAC 
TGGTGTATGC 
GGCAGGCCCG 



This corresponds to the amino acid sequence <SEQ ID 708; ORF47-l>: 

1 MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHE LSGFYWHAHE 

51 M IWGYAGLW IAFLLTAVA T WTGQPPTRGG VLVGLTIFWL AARIAAFI PG 

101 WGASAS GILG TLFFWYGAVC MAL PVIRSQN QRNYVAVFAL FVLGGTHAAF 

151 HVQLHNGNLG GLLSGLQS GL VMVSGFIGLI GTRII SFFTS KRLNVPQIPS 

201 PKW VAQASLW LPMLTAMLMA HGVLAW LSAV FAFAAGVIFT VQV YRWWYKP 

251 VLKEPMLW IL FAGYLFTGLG LIAVGA SYFK P AFLNLGVHL IGVGGIGVL T 

301 LGMMARTALG HTGNPIYPPP KAVP VAFWLM MAATAVRMVA V FSSGTAYTH 

351 SIRTSSVLFA LALLVYAWKY IPWLIRPRSD GRPG* 

Computer analysis of this amino acid sequence predicts a leader peptide and also gave the 
following results: 

Homology with a predicted ORF from M. meningitidis (strain A) 

ORF47 shows 99.4% identity over a 172aa overlap with an ORF (ORF47a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 47 . pep MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHXLSGFYWHAHEM IWGYAGLVV 
II MM M II Ml I till I I Ml MM III Mill I || I I I I I I II I I I I II I II I II I 
orf 4 7a MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHE MIWGYAGLVV 

10 20 30 40 50 60 

70 80 90 100 110 120 

or f 4 7 . pep I AFLLT AVA TWTGQP PTRGG VLVGLT I FWLAARI AAFI PGWGASAS G I LGTLFFWYGAVC 
I M M M I II I II II II I II II M II I I I I I M II II II II I I I I I I I I I I I I i I | I | | | 
orf 4 7a I AFLLT AVA TWTGQPPTRGG VLVGLTI FWLAARI AAFI PGWGAS AS GI LGTLFFWYGAVC 

70 80 90 100 TlO 120 

130 140 150 160 170 

orf 47. pep MALPVIRSQNQRN YVAVFALFVIXaGTHAAF HVQLHNGNLGGLLSGLOSGLVM 

I M 1 1 1 1 1 M i M 1 1 M I ii M 1 1 ii 1 1 M i M 1 1 M ii ii 1 1 ii 1 1 iTTTT 

orf 4 7a MALPVIRSQNQRN YVAVFALFVLGGTHAAF HVQLHNGNLGGLLSGLQS GLVMVSGFIGLI 
130 140 150 160 170 180 

orf 4 7a GTRIISFFTSKRLNVPQIPSPKWVAQASLWLPMLTAMLMAHGVMPWLSAAFAFAAGVIFT 
190 200 210 220 230 240 
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The complete length ORF47a nucleotide sequence <SEQ ID 709> is: 

1 ATGAAATTTA CCAAGCACCC CGTTTGGGCA ATGGCGTTCC GCCCGTTTTA 

51 TTCACTGGCG GCTCTGTACG GCGCATTGTC CGTATTGCTG TGGGGTTTCG 

101 GCTACACGGG AACGCACGAG CTGTCCGGTT TCTATTGGCA CGCGCATGAG 

151 ATGATTTGGG GTTATGCCGG ACTGGTCGTC ATCGCCTTCC TGCTGACCGC 

201 CGTCGCCACT TGGACGGGGC AGCCGCCCAC GCGGGGCGGC GTTCTGGTCG 

251 GCTTGACTAT CTTTTGGCTG GCTGCGCGGA TTGCCGCCTT TATCCCGGGT 

301 TGGGGTGCGT CGGCAAGCGG CATACTCGGT ACGCTGTTTT TCTGGTACGG 

351 CGCGGTGTGC ATGGCTTTGC CCGTTATCCG TTCGCAGAAT CAACGCAATT 

401 ATGTTGCCGT GTTCGCGCTG TTCGTCTTGG GCGGTACGCA CGCGGCGTTC 

4 51 CACGTCCAGC TGCACAACGG CAACCTAGGC GGACTCTTGA GCGGATTGCA 

501 GTCGGGCTTG GTGATGGTGT CGGGTTTTAT CGGTCTGATT GGTACGCGGA 

551 TTATTTCGTT TTTTACGTCC AAACGGTTGA ATGTGCCGCA GATTCCCAGT 

601 CCGAAATGGG TGGCGCAGGC TTCGCTGTGG CTGCCCATGC TGACCGCCAT 

651 GCTGATGGCG CACGGCGTGA TGCCTTGGCT GTCGGCGGCT TTCGCGTTTG 

701 CGGCAGGTGT GATTTTTACC GTGCAGGTGT ACCGCTGGTG GTATAAGCCT 

751 GTGTTGAAAG AGCCGATGCT GTGGATTCTG TTTGCCGGCT ATCTGTTTAC 

801 CGGATTGGGG CTGATTGCGG TCGGCGCGTC TTATTTCAAA CCCGCTTTCC 

851 TCAATCTGGG TGTGCATCTG ATCGGGGTCG GCGGTATCGG CGTGCTGACT 

901 TTGGGCATGA TGGCGCGTAC CGCGCTCGGT CATACGGGCA ATCCGATTTA 

951 TCCGCCGCCC AAAGCCGTTC CCGTTGCGTT TTGGCTGATG ATGGCGGCAA 

1001 CCGCCGTCCG TATGGTTGCC GTATTTTCTT CCGGCACTGC CTACACGCAC 

1051 AGCATACGCA CCTCTTCGGT TTTGTTTGCA CTCGCGCTTT TGGTGTATGC 

1101 GTGGAAGTAT ATTCCTTGGC TGATTCGTCC GCGTTCGGAC GGCAGGCCCG 

1151 GTTGA 

This encodes a protein having amino acid sequence <SEQ ID 710>: 

1 MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHE LSGFYWHAHE 

51 MIWGYAGLW IAFLLTAVA T WTGQPPTRGG V LVGLTIFWL AARIAAFI PG 

101 WGASAS GILG TLFFWYGAVC MAL PVIRSQN ORN YVAVFAL FVLGGTHAAF 

151 HVQLHNGNLG GLLSGLQS GL VMVSGFIGLI GTRII SFFTS KRLNVPQIPS 

201 PKW VAQASLW LPMLTAMLMA HGVMPW LSAA FAFAAGVIFT VQV YRWWYKP 

251 VLKEPMLW IL FAGYLFTGLG LIAVGA SYFK P AFLNLGVHL IGVGGIGVLT 

301 LGMMARTALG HTGNPIYPPP KAVP VAFWLM MAATAVRMVA V FSSGTAYTH 

351 SIRTSSVLFA LALLVYAW KY IPWLIRPRSD GRPG* 

ORF47a and ORF47-1 show 99.2% identity in 384 aa overlap: 

10 20 30 40 50 60 

MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLVV 
I MINI I II IIMf III I I M HUM 111 II MlllilMII II I I IMIIIMIIM 
MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLW 
10 20 30 40 50 60 

70 80 90 100 110 120 

IAFLLTAVATWTGQPPTRGGVLVGLTI FWLAARIAAFI PGWGASASGILGTLFFWYGAVC 
1 1 1 1 1 1 M 1 1 1 1 f 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 1 I 1 1 I i M I M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
IAFLLTAVATWTGQPPTRGGVLVGLT I FWLAARIAAFI PGWGASASGILGTLFFWYGAVC 
70 80 90 100 110 120 

130 140 150 160 170 180 

MALPVIRSQNQRNYVAVFALFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVSGFIGLI 
Mllllll II I II MM I II I llllll I III II I II Ml III M II MIIIMIII II II 
MALPVIRSQNQRNYVAVFALFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVSGFIGLI 
130 140 150 160 170 180 

190 200 210 220 230 240 

GTRI IS FFTSKRLNVPQI PS PKWVAQASLWLPMLTAMLMAHGVMPWLSAAFAFAAGV I FT 

1 1 1 I I I I I t I I I I E 1 t 1 I I I t 1 1 1 1 1 1 I I I 1 1 1 I I I 1 1 - MM:lllllilM! 

GTRI I S FFTSKRLNVPQI PS PKWVAQASLWLPMLTAMLMAHGVLAWLSAVFAFAAGVI FT 
190 200 210 220 230 240 

250 260 270 280 290 300 

VQVYRWWYKPVLKEPMLWILFAGYLFTGLGLIAVGASYFKPAFLNLGVHLIGVGGIGVLT 
IIMMUM MMIIM II MIMII I IIMINIIM MM II III III Mill I I II 
VQVYRWWYKPVLKEPMLWILFAGYLFTGLGLIAVGASYFKPAFLNLGVHLIGVGGIGVLT 
250 260 270 280 290 300 



orf47a.pep 
orf47-l 

orf47a.pep 
orf47-l 

or f 4 7a. pep 
orf47-l 

orf47a.pep 
orf47-l 

orf47a.pep 
orf47-l 



310 320 330 340 350 360 
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orf 47 a . pep LGMMARTALGHTGN P I YP PPKAV PVAFWLMMAATAVRMVAVFS SGTAYTH S I RT S SVLFA 

I I I I II I II I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I 
or f 4 7-1 LGMMARTALGHTGNPIYPPPKAVPVAFWLMMAATAVRMVAVFS SGTAYTH S I RTSSVLFA 

310 320 330 340 350 360 

370 380 
or f 4 7a . pep LALLVYAWKYIPWLIRPRSDGRPGX 
I I I I I I I I I I I I II I I I I I I I II I I 
orf 4 7-1 LALLVYAWKYIPWLIRPRSDGRPGX 

370 380 

Homology with a predicted ORF from N. gonorrhoeae 

ORF47 shows 97.1% identity over 172 aa overlap with a predicted ORF (ORF47ng) from 
N. gonorrhoeae: 

ORF47 MKFTKH PVWAMAFRPFYSLAALYGALS VLLWG FGYTGTHELS GFYWHAHEMI WGYAGLW 60 

I I I I I I I I II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
ORF4 7ng MKFTKH PVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLW 60 

ORF4 7 IAFLLTAVATWTGQPPTRGGVLVGLTIFWLAARIAAFIPGWGASASGILGTLFFWYGAVC 120 

I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II II I I I I I I I : I I i ! I I! I I I M I II I 
ORF47ng IAFLLTAVATWTGQPPTRGGVLVGLTAFWLAARIAAFIPGWGAAASGILGTLFFWYGAVC 120 

ORF47 MALPVIRSQNQRNYVAVFALFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVM 172 

I I I I I I I I I I : I I I I I I I I : II I I I II I I I I I I I I I 1 1 I I I 1 I I I I I 1 1 I I I 
ORF4 7ng MALPVIRSQNRRNYVAVFAI FVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVWGFIGLI 180 

The ORF47ng nucleotide sequence <SEQ ID 71 1> is predicted to encode a protein comprising 
amino acid sequence <SEQ ID 712>: 

1 MKFTKHPVWA MAFRPFYSIA ALYGALSVLL WGFGYTGTHE LSGFYWHAHE 

51 MIWGYAGLW IAFLLTAVA T WTGQPPTRGG VLVGLTAFWL AARIAAFI PG 

101 WGAAAS GILG TLFFWYGAVC MAL PVI RSQN RR NYVAVFAI FVLGGTHAAF 

151 HVQLHNGNLG GLLSGLQS GL VMVWGFIGLI GMKII SFFTS KRLKLPQIPS 

201 PKWVAHASLW LPMLNAILMA HRVMP WLSAA FPFAAGVIFT VQVY AGGITP 

251 IEETSCGSVA GICYRLGNSS G 

The predicted leader peptide and transmembrane domains are identical (except for an Ile/Ala 
substitution at residue 87 and an Leu/He substitution at position 140) to sequences in the 
meningococcal protein (see also Pseudomonas stutzeri orf396, accession number e246540): 

TM segments in 0RF47ng 



INTEGRAL 


Likelihood 


S3 


-5. 


63 


Transmembrane 


52 


- 68 


INTEGRAL 


Likelihood 




-3. 


88 


Transmembrane 


169 


- 185 


INTEGRAL 


Likelihood 


est 


-3. 


08 


Transmembrane 


82 


- 98 


INTEGRAL 


Likelihood 




-1. 


91 


Transmembrane 


134 


- 150 


INTEGRAL 


Likelihood 




-1. 


44 


Transmembrane 


107 


- 123 


INTEGRAL 


Likelihood 




-1. 


38 


Transmembrane 


227 


- 243 



Further work revealed the complete gonococcal DNA sequence <SEQ ID 713>: 

1 ATGAAATTTA CCAAACATCC CGTCTGGGCA ATGGCGTTCC GCCCGTTTTA 

51 TTCACTGGCG GCACTGTACG GCGCATTGTC CGTATTGCTG TGGGGTTTCG 

.101 GCTACACGGG AACGCACGAG CTGTCCGGTT TCTATTGGCA CGCGCATGAG 

151 ATGATTTGGG GTTATGCCGG TCTCGTCGTC ATCGCCTTCC TGCTGACCGC 

201 CGTCGCCACT TGGACGGGAC AGCCGCCCAC GAGGGGCGGC GTTCTGGTCG 

251 GCTTGACCGC CTTTTGGCTG GCTGCGCGGA TTGCCGCCTT TATCCCGGGT 

301 TGGGGTGCGG CGGCAAGCGG CATACTCGGT ACGCTGTTTT TCTGGTACGG 

351 CGCGGTGTGC ATGGCTTTGC CCGTTATCCG TtcgCAAAAC CGGCGCAACT 

401 ATGtcgCCGT ATTCGCAATA TTTGTGCTGG GCGGTACGCA TGCGgcgTTC 

451 CACGtccAgc tGCACAACGG CAACCTAGGC GGACTCTTGA GCGGATTGCA 

501 GTCGGGCCTG GTTATGGTGT CGGGCTTTAT CGGCCTGATT GGGATGAGGA 

551 TTATTTCGTT TTTTACGTCC AAACGGTTGA ACGTGCCGCA GATTCCCAGT 

601 CCGAAATGGG TGGCGCAGGC TTCGCTGTGG CTACCCATGC TGACCGCCAT 
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651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 



ACTGATGGCG 
CGGCGGGCGT 
GTATTGAAAG 
CGGATTGGGG 
TCAATCTGGG 
TTGGGCATGA 
TCCGCCGCCC 
CCGCCGTCCG 
AGCATCCGCA 
GTGGAAATAC 
GTTGA 



CACGGCGTGA 
GATTTTTACC 
AACCGATGCT 
CTGATTGCGG 
CGTACATCTG 
TGGCGCGTAC 
AAAGCCGTTC 
TATGGTTGCC 
CGTCTTCGGT 
ATTCCGTGGC 



TGCCTTGGCT 
GTACAGGTGT 
GTGGATTCTG 
TCGGCGCGTC 
ATCGGGGTCG 
CGCGCTCGGT 
CCGTTGCGTT 
GTATTTTCTT 
TTTGTTTGCA 
TGATCCGTCC 



GTCGGCGGCT 
ACCGCTGGTG 
TTTGCCGGCT 
TTATTTCAAA 
GCGGTATCGG 
CATACGGGCA 
TTGGCTGATG 
CCGGCACTGC 
CTCGCGCTGC 
GCGTTCGGAC 



TTCGCGTTTG 
GTATAAACCC 
ATCTGTTTAC 
CCTGCCTTCC 
CGTGCTGACT 
ATTCGATTTA 
ATGGCGGCAA 
CTACACGCAC 
TGGTGTATGC 
GGCAGGCCCG 



This encodes a protein having amino acid sequence <SEQ ID 714; ORF47ng-l>: 

1 MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHE LSGFYWHAHE 

51 MIWGYAGL W IAFLLTAVA T WTGQPPTRGG VLVGLTAFWL AARIAAFIPG 

10 1 WfcAAAS GILG TLFTOYGAVC MAL PVIRSQN RRNYVAVFAI FVLGGTHAAF 

151 HVQLHNGNLG fiTJ t SGI^S GLVMVSGFIGLI GMRII SFFTS KRLNVPQIPS 

201 PKWVAO ASLW LPMLTAILMA HGVMPWLSAA FAFAAGVIFT VQVYRWWYKP 

251 VT.KEPMLW IL FAGYLFTGLG LIAVGA SYFK P AFLNLGVHL IGVGGIGVLT 

301 LGMMARTALG HTGNSIYPPP KAVP VAFWLM MAATAVRMVA V FSSGTAYTH 

351 SIRTSSVLFA LALLVYAW KY IPWLIRPRSD GRPG* 

ORF47ng-l and ORF47-1 show 97.4% identity in 384 aa overlap: 



orf 47-1. pep 
orf 47ng-l 

orf 47-1. pep 
orf47ng-l 

orf 47-1. pep 
orf47ng-l 

orf 47-1. pep 
orf47ng-l 

orf 47-1. pep 
orf47ng-l 

orf 4 7-1. pep 
orf47ng-l 



10 20 30 40 50 , 60 

MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLVV 

I | II II 111 III I II I I I II I HI I I MM III HIM I Mi M II I II MM 1 1 M|M 
l^^raPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLW 

10 20 30 40 50 60 

70 80 90 100 110 120 

IAFLLTAVATWTGQP PTRGGVLVGLT I FWLAARI AAFI PGWGASASG I LGTLFFWYGAVC 
M llllll HIM MUM II Mill I I I I II II M II M II M II II II M M I M II 
IAFLLTAVATWTGQP PTRGGVLVGLT AFWLAARIAAFI PGWGAAASG I LGTLFFWYGAVC 

70 80 90 100 110 120 

130 140 150 160 170 180 

MAL PVI RSQNQRNYVAV FALFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVSG FI GLI 
I I I I I | 1 1 | | : | | I M M I : I M M II I M II II M M M 11 M I II I M II M I M M I 
MALPVIRSQNRRNYVAVFAIFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVSGFIGLI 

130 140 150 160 170 180 

190 200 210 220 230 240 

GTR IIS FFTSKRLNVPQI PS PKWVAQASLWLPMLTAMLMAHGVLAWLS AVFAFAAGV I FT 
| | M HI IMM 1 Mill M MM I MM MM MM I II II: MMMIMMMM 
GMRI IS FFTSKRLNVPQI P S PKWVAQASLWLPMLT AI LMAHGVMPWLS AAFAFAAGVI FT 

190 200 210 220 230 240 

250 260 270 280 290 300 

VQVYRWW YKPVLKE PMLW I LFAG YLFTGLGLIAVGAS YFKPAFLNLGVHLIGVGGIGVLT 
I I t 1 1 1 1 I | 1 1 M i 1 1 I I I I M II M I M M I M 1 it I I M II M II II II M M M II I 
VQVYRWWYKPVLKEPMLWILFAGYLFTGLGLIAVGASYFKPAFLNLGVHLIGVGGIGVLT 

250 260 270 280 290 300 

310 320 330 340 350 360 

LGMMARTALGHTGNPIYPPPKAVPVAFWLMMAATAVRMVAVFSSGTAYTHSIRTSSVLFA 

IIIHIIIMMM I I I II II I M M II II M II I M M II II M I M I M M II II II 
LGMMARTALGHTGNSIYPPPKAVPVAFWLMMAATAVRMVAVFSSGTAYTHSIRTSSVLFA 
310 320 330 340 350 360 



370 380 
orf 47-1 . pep LALLVYAWKYI PWLI RPRS DGRPGX 
III Mill MM II MM I 1MMI 
or f 4 7 ng- 1 LALLVYAWKYI PWLIRPRSDGRPGX 

370 380 



Furthermore, ORF47ng-l shows significant homology to an ORF from Pseudomonas stutzeri: 



gnl|PIDie246540 (Z73914) ORF396 protein [Pseudomonas stutzeri] Length 
Score - 155 bits (389), Expect = 5e-37 



= 396 
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10 



Identities = 121/391 (30%), Positives = 169/391 (42%), Gaps = 21/391 (5%) 

Query: 7 PVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFY WHAHEMIWGYAGLV 59 

P+W +AFRPF+ +LY L++ LW +TG GF WH HEM++G+A + 

Sbjct: 14 P I WRLAFRP FFLAG S L Y ALLAI PLWVAAWTGLW P — GFQPTGGWLAWHRHEMLFGFAMAI 71 

Query: 60 VIAFLLTAVATWTGQPPTRGGVLVGLTAFWLAARIAAFIPGWGAAASGILGTLFFWYGAV 119 

V FLLTAV TWTGQ G LVGL A WLAAR+ ++ G AA L LF 
Sbjct: 72 VAGFLLTAVQTWTGQTAPSGNRLVGLAAWLAARL-GWLFGLPAAWLAPLDLLFLV^ 130 

Query: 120 CMALPVIRSQNRRNYVAVFAIFVLGGTHAAFXXXXXXXXXXXXXXXXXXXXXMVSGFIGL 179 

MA + + +RNY V + ++ G +V+ + L 

Sbjct: 131 MMAQMLWAVTlQKItfl YP I WVLSLMLGADVLI M 190 



15 Query: 180 IGMRI I S FFTSKRLNVPQI PS P-KWVAQASLWLPMLTAILMAHGV MPWLSAAFAFA 234 

IG R+I FFT + L P W+ A L + A+L A GV PL FA 

Sbjct: 191 IGGRVIPFFTQRGLGKVDAVKPWVWLDVALLVGTGVIALLHAFGVAMRPQPLLGLLFV-A 249 

Query: 235 AGVIFTVQVYRWWYKPVLKEPMLWILFAGYLFTGLGLIAVGASYF-KPAFXXXXXXXXXX 293 
20 GV +++ RW+ K + K +LW L L+ + + +F A 

Sbjct: 250 IGVGHLLRLMRWYDKGIWKVGLLWSLHVAMLWLWAAFGLALWHFGLLAQS S PSLHALSV 309 

Query: 294 XXXXXXXXXMMARTALGHTGNSIYPPPKAVPVAFWLXXXXXXXXXXXXFSSGTAYTHSIR 353 
M+AR LGHTG + P+AFL FS + 

25 Sbjct: 310 GSMSGLILAMIARVTLGHTGRPLQLPAGIIG-AFVL FNLGTAARVFLSVAWPVGGLW 365 

Query: 354 TSSVLFALALLVYAWKYIPWLIRPRSDGRPG 384 

++V + LA +Y W+Y P L+ R DG PG 
Sbjct: 366 LAAVCWTLAFALYVWRYAPMLVAARVDGHPG 396 

30 

Based on this analysis, it is predicted that the proteins from N.meningitidis and Kgonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 85 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 71 5>: 

35 1 . .ATGCCGTCTG AAGGTTCAGA CGGCmTCGGT GyCGGGGAAy CAGAAGyGGT 

51 AGCGCATGCC CAATGAGACT TCGTGGGTTT TGAAGCGGGT GTTTTCCAAG 

101 CGTCCCCAGT TGTGGTAACG GTATCCGGTG TCyAArGTCA GCTTGGGyGT 

151 GATGTCGAAa CCGACACCGG CGATGACACC AAGACCyAmG CTGCTGATrC 

201 TGTkGCTTTC GTGATAGGsA GGTTTGyTGG JcmksAsyTTG TAyrATwkkG 

40 251 CCTssCwsTG kAGmGCCkTk CkyTGGTkkA swGrwArTAG TCGTGGTTTy 

301 TkTTyyCACC GAATGAACyT GATGTTTAAC GTGTCCGTAG GCGACGCGCG 

351 CGCCGATATA GGGTTTGAAT TTATCGTTGA GTTTGAAATC GTAAATGGCG 

401 GACAAGCCGA GAGAAGAAAC GGCGTGGAAG CTGCCGTTTC CCTGATGTTT 

451 TGTTTGGGTT TCTTTGTAGT TGTTGTTTAT CTCTTCAGTA ACTTTTTTAG 

45 501 TAGAAGAATT ACTTTCTTTC CATTTTCTGT AACTGGCATA ATCTGCCGCT 

551 ATTCTCCAGC CGCCGAAATC 

This corresponds to the amino acid sequence <SEQ ID 716; ORF67>: 

1 . .MPSEGSDGXG XGEXEXVAHA QXDFVGFEAG VFQASPVWT VSGVXXQLGX 

51 DVETDTGDDT KTXAADXVAF VIGRFXGXXL YXXAXXXXAX XWXXXXSRGF 

50 101 XXHRMNLMFN VSVGDARADI GFEFIVEFEI VNGGQAERRN GVEAAVSLMF 

151 CLGFFWWY LFSNFFSRRI TFFPFSVTGI ICRYSPAAEI . . 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. gonorrhoeae 

ORF67 shows 51.8% identity over 199 aa overlap with a predicted ORF (ORF67ng) from 
55 At gonorrhoeae: 
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or f 67. pep 
orf67ng 

orf67.pep 
orf 67ng 
orf67.pep 
orf67ng 
orf 67. pep 
orf67ng 



MPSEGS DGXGXGEXEXYAHAQXDFVGFEAG 
1 1 I I } I 1 1 I II i HIM I1MI1I 
TN FEI AVLSGMTVRVFYCARPAPVNGGRLKMP SEGSDGIG I GESEAVAHAQRGFVGFEAG 
90 100 HO 120 130 



140 



VFQASPVWTVSGVXXQLGXDVETDTGDDTKTXAADXVAFVIGRFXGXXLYXXAXXXXAX 

I I I I I I I I i • I • 1 1 I I | | : : : : : 1 I I I I *• i I I : : 
VFQAS PVWAV AGVQGQAGRDVYAHARHRAE AQAAAAVAFLI GV FLRMSVRINRNCCVS I 

XWXXXXSRGFXXHRMNLMFNVSVGDARADIGFEFIVEFEIVNGGQAERRNGVEAAVSI^F 

I : |:: : : | | | 1 1 | I : I I 1 1 I I : I M M M I I I I M I i 1 1 I II Ml 
TRVGGKSTCYFFSRIDAVSDVSVGDARTDIGFEFWEFEIVNGGQAERRNGVECAVFLMF 

CLGFFW WYLFSNFFSRRITFF-PFSVTGIICRYSPAAEI 

I I I :: |: I: : I : II Mill Mill: 

RLLVFYVKLVAAKS FI I LS FQLFYVHGI FI WPFPVTGI IRGDAPAAEVVADRHPGVDGM 



30 



146 



90 



206 



150 



266 



190 



326 



The ORF67ng nucleotide sequence <SEQ ID 717> is predicted to encode a protein comprising 
amino acid sequence <SEQ ID 718>: 



1 MPSETVGSIV 

51 NRHSHGSGNL 

101 VFYCARPAPV 

151 SPVWAVAGV 

201 NCCVSITRVG 

251 QAERRNGVEC 

301 PVTGIIRGDA 

351 IVGNAFGGVG 



NVGVDESVGF 
GRGVWATVLS 
NGGRLKMPSE 
QGQAGRDVYA 
GKSTCYFFSR 
AVFLMFRLLV 



SPPFPSIQHF 
DKFPCGQVRI 
GSDGIGIGES 
HARHRAEAQA 
IDAVSDVSVG 
FYVKLVAAKS 



YRFHRIHRIR 
PACAGMTNFE 
EAVAHAQRGF 
AAAVAFLIGV 



DARTDIGFEF 
FIILSFQLFY 



LFRPPGPMQL 
IAVLSGMTVR 
VGFEAGVFQA 
FLRMSVRINR 



WEFEIVNGG 
VHGIFIWPF 



PAAEWADRH PGVDGMRTDV SEIIAYRAYF VFAWSGWFRI 



Based on the presence of a several putative transmembrane domains in the gonococcal protein^ it 
is predicted that the proteins from N.meningitidis and ^gonorrhoeae, and their epitopes, couldlbe 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 86 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 719> 

1 ATGTTTGCTT TTTTAGAAGC CTTTTTTGTC GAATACGGTT ATGCGGCTGT 

51 TTTTTTTGTA TTGGTCATCT GCGGTTTCGG CGTGCCGATT CCCGAGGATT 

101 TGACCTTGGT AACAGGCGGC GTGATTTCGG GTATGGGTTA TACCAATCCG 

151 CATATTATGT TTGCAGTCGG TATGCTCGGC GTATTGGTCG GGGACGGCAT 

201 CATGTTCGCC GCCGGACGAA TTTGGGGGCA GArArTCCTA rGGTTCArAC 

251 CTATTGCGsG CATCATGACG CCGrAACGTT ATGAGCAGGT TCAGGAAAAA 

301 TTCGACAAAT ACGGTAACTG GGTCTTATTT GTCGCCCGTT TCCTGCCCGG 

351 TTTGAGAACG GCCGTATTTG TTACAGCCGG TATCAGCCGC AAGGTTTCAT 

401 ACTTGCGTTT TATCATTATG GATGGACTGG CCGCA. . . 

This corresponds to the amino acid sequence <SEQ ID 720; ORF78>: 

1 MFAFLEAFFV RYft YAAVFFV LVICGFGVPI PEDLTLVTGG VISGMGYTNP 
51 HIMFAV GMLG VLVGDGIM FA AGRIWGQXXL XFXPIAXIMT PXRYEQVQEK 
10 1 FDKYGNWVLF VARFLPGLR T AVFVTAGISR KVSYLRFIIM DGLAA. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 721>: 

1 ATGTTTGCTT TTTTAGAAGC CTTTTTTGTC GAATACGGTT ATGCGGCTGT 

51 TTTTTTTGTA TTGGTCATCT GCGGTTTCGG CGTGCCGATT CCCGAGGATT 

101 TGACCTTGGT AACAGGCGGC GTGATTTCGG GTATGGGTTA TACCAATCCG 

151 CATATTATGT TTGCAGTCGG TATGCTCGGC GTATTGGTCG GGGACGGCAT 

201 CATGTTCGCC GCCGGACGAA TTTGGGGGCA GAAAATCCTA AGGTTCAAAC 

251 CTATTGCGCG CATCATGACG CCGAAACGTT ATGAGCAGGT TCAGGAAAAA 

301 TTCGACAAAT ACGGTAACTG GGTCTTATTT GTCGCCCGTT TCCTGCCCGG 

351 TTTGAGAACG GCCGTATTTG TTACAGCCGG TATCAGCCGC AAGGTTTCAT 

401 ACTTGCGTTT TATCATTATG GATGGACTGG CCGCACTGAT TTCCGTCCCT 

451 ATTTGGATTT ATCTGGGCGA ATACGGTGCG CACAACATCG ATTGGCTGAT 
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501 GGCGAAAATG CACAGCCTGC AATCGGGTAT TTTTGTTATC TTGGGTATAG 

551 GTGCGACCGT TGTCGCTTGG ATTTGGTGGA AAAAACGCCA ACGTATCCAG 

601 TTTTACCGCA GCAAATTGAA AGAAAAGCGG GCGCAACGCA AAGCCGCCAA 

651 GGCAGCCAAA AAAGCCGCGC AAAGCAAACA ATAA 

This corresponds to the amino acid sequence <SEQ ID 722; ORF78-l>: 

1 MFAFLEAFFV EYG YAAVFFV LVICGFGVPI PEDLTLVTGG VISGMGYTNP 

51 H IMFAVGMLG VLVGDGIM FA AGRIWGQKIL RFKPIARIMT PKRYEQVQEK 

* 101 FDKYGNW VLF VARFLPGLRT AVFVT AGISR KVSYL RFIIM DGLAALISVP 

151 IWIYLGEYGA HNIDWLMAKM HSL QSGIFVI LGIGATWAW I WWKKRQRIQ 

201 FYRSKLKEKR AQRKAAKAAK KAAQSKQ* 

Computer analysis of this amino acid sequence predicts several transmembrane domains, and also 
gave the following results: 

Homology with the dedA homologue of H.influenzae (accession number P45280) 
ORF78 and the dedA homologue show 58% aa identity in 144aa overlap: 

Orf78: 4 FLEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGM— GYTNPHIMFAVGMLGV 61 

FL FF EYGY AV FVL+ICGFGVPIPED+TLV+GGVI+G+ N H+M V M+GV 

DedA: 20 FLIGFFTEYGYWAVLFVLIICGFGVPIPEDITLVSGGVIAGLYPENVNSHLMLLVSMIGV 79 

0rf78: 62 LVGDGIMFAAGRIWGQXXLXFXPIAXIMTPXRYEQVQEKFDKYGNWVLFVARFLPGLRTA 121 

L GD M+ GRI+G L F PI I+T R V+EKF +YGN VLFVARFLPGLR 
DedA: 80 LAGDSCMYWLGRIYGTKILRFRPIRRIVTLQRLRMVREKFSQYGNRVLFVARFLPGLRAP 139 



0rf78: 122 VFVTAGISRKVSYLRFI IMDGLAA 145 

+++ +GI+R+VSY+RF+++D AA 
DedA: 140 IYMVSGITRRVSYVRFVLI DFCAA 163 



Homology with a predicted ORF from Mmeninzitidis (strain A) 

ORF78 shows 93.8% identity over a 145aa overlap with an ORF (ORF78a) from strain A of K 
meningitidis: 



orf 78 .pep 
orf78a 

orf 7 8. pep 
orf78a 



10 20 30 40 50 60 

MFAFLEAFFVEYG YAAVFEVLVICGFGVPI PEDLTLVTGGVISGMGYTNPH IMFAVGMLG 
i I I : I I I I I II I I I I I I I ! I I I I II I! I I I I I M I I I I I I I I I I I I I I I II I I 1 I I I I I I 
MFALLEAFFVEYG YAAVFFVLVICGFGVPI PEDLTLVTGGVISGMGYTNPH IMFAVGMLG 
10 



20 



30 



40 



50 



60 



70 80 90 100 110 120 

VLVGDGIM FAAGRIWGQXXLXFXPIAXIMTPXRYEQVQEKFDKYGN WVLFVARFLPGLRT 
! II Ml I I I I I M I I II | 1 III MM M I I ! I I I I I I I I 1 1 1 I I I I 1 1 I I I 1 I 
VLVGDGIM FAAGRIWGQKILKFKPIARIMTPKRYAQVQEKFDKYGN WVLFVARFLPGLRT 

70 80 90 100 110 120 



130 140 
orf 78 . pep AVFVT AG I SRKVSYLR FI IMDGLAA 
I MM Ml IIMilll 1:1 I III II 
orf 78a AVFV TAGISRKVSYLR FLIMDGLAALISVPVWI YLGEYGAHNIDWLMAKMHSLQ SGIFIA 

130 140 150 160 170 180 

The complete length ORF78a nucleotide sequence <SEQ ID 723> is: 



1 ATGTTTGCCC TTTTGGAAGC CTTTTTTGTC GAATACGGCT ATGCGGCCGT 

51 GTTTTTCGTT TTGGTCATCT GCGGTTTCGG CGTGCCGATT CCCGAGGATT 

101 TGACCTTGGT AACAGGCGGC GTGATTTCGG GTATGGGTTA TACCAATCCG 

151 CATATTATGT TTGCAGTCGG TATGCTCGGC GTATTGGTCG GGGACGGCAT 

201 CATGTTCGCC GCCGGACGCA TCTGGGGGCA GAAAATCCTC AAGTTCAAAC 

251 CGATTGCGCG CATCATGACG CCGAAACGTT ACGCACAGGT TCAGGAAAAA 

301 TTCGACAAAT ACGGCAACTG GGTGTTATTT GTCGCTCGTT TCCTGCCCGG 

351 TTTGCGGACT GCCGTTTTCG TTACCGCCGG CATCAGCCGC AAAGTATCGT 

401 ATCTGCGCTT TCTGATTATG GACGGGCTTG CCGCGCTGAT TTCCGTGCCC 

451 GTTTGGATTT ACTTGGGCGA GTACGGCGCG CACAACATCG ATTGGCTGAT 
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501 GGCGAAAATG CACAGCCTGC AATCCGGCAT CTTCATCGCA TTGGGCGTGC 

551 TGGCGGCGGC GCTGGCGTGG TTCTGGTGGC GCAAACGCCG ACATTATCAG 

601 CTTTACCGCG CACAATTGAG CGAAAAACGC GCCAAACGCA AGGCGGAAAA 

651 GGCAGCGAAA AAAGCGGCAC AGAAGCAGCA GTAA 

This encodes a protein having amino acid sequence <SEQ ID 724>: 



1 MFALLEAFFV EYG YAAVFFV LVICGFGVPI PEDLTLVTGG VISGMGYTNP 

51 H IMFAVGMLG VLVGDGIM FA AGRIWGQKIL KFKPIARIMT PKRYAQVQEK 

101 FDKYGNW VLF VARFLPGLRT AVFV TAGISR KVSYL RFLIM DGLAALISVP 

151 VWIYLGEYGA HNIDWLMAKM HSL QSGIFIA LGVLAAALAW F WWRKRRHYQ 

201 LYRAQLSEKR AKRKAEKAAK KAAQKQQ* 

ORF78a and ORF78-1 show 89.0% identity in 227 aa overlap: 

10 20 30 40 50 60 

orf 78a . pep MFALLEAFFVEYGYAAVFFVLVICGFGVPI PEDLTLVTGGVI SGMGYTNPHIMFAVGMLG 
I I I :l I I I I I 1 I I I I I I I I I I I I I II I I I I I 1 I I I I I I I I II I I II I I I I I I I 1 II I I I I 
or f 7 8 - 1 MFAFLEAFFVEYGYAAVFFVLVICGFGVPI PEDLTLVTGGVI SGMGYTNPHIMFAVGMLG 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 7 8a . pep VLVGDGIMFAAGRI WGQKI LKFKP I ARIMTPKRYAQVQEKFDKYGNWVLFVARFLPGLRT 
II I I I I I I I I I I I I I II I I I : I I I I I I I I I I I I I II I! I I I I I I I II I I I I I I I I I I I I 
orf 78-1 VLVGDGIMFAAGRIWGQKILRFKPIARIMTPKRYEQVQEKFDKYGNWVLFVARFLPGLRT 

70 80 90 100 110 , 120 

130 140 150 160 170 18Q 

orf 78a. pep AVFVTAGISRKVSYLRFLIMDGLAALISVPVWIYLGEYGAHNIDWLMAKMHSLQSGIFIA 

I I MM I I I M [ I li I t:l I I II I II III 1:1 Mil Ml I II I I I I I I I I I M I 1 11 I : 
orf 78-1 AVFVTAGISRKVSYLRFIIMDGLAALISVPIWIYLGEYGAHNIDWLMAKMHSLQSGIFVI 

130 140 150 160 170 180 

190 200 210 220 

or f 7 8a . pep LGVLAAAIAWFWWRKRRHYQLYRAQLSEKRAKRKAEKAAKKAAQKQQX 
II: I ::: 1 I: II: I I :: I : I I : : I : I I I I : I I I I I I I I I II :: I I 
or f 7 8- 1 LG IGATWAW I WWKKRQRIQFYRS KLKEKRAQRKAAKAAKKAAQSKQX 

190 200 210 220 



Homology with a predicted ORF from N. gonorrhoeae 

ORF78 shows 97.4% identity over 38 aa overlap with a predicted ORF (ORF78ng) from N. 
gonorrhoeae: 

orf 78 . pep XXLXFX P I AXIMT PXRYEQVQEKFDKYGNWVLFVARFLPGLRTAVFVTAGI SRKVSYLRF 137 

MMMMIMIIMM I III MUM I li 
orf78ng YPVLFVARFLPGLRTAVFVTAGI SRKVSYLRF 32 

orf 7 8. pep IIMDGLAA 145 
MINIM 

orf78ng LIMDGLAALISVPWIYLGEYGAHNIDWLMAKMHSLQSGIFIALGVLAAALAWFWWRKRR 92 

The ORF78ng nucleotide sequence <SEQ ID 725> is predicted to encode a protein comprising 
amino acid sequence <SEQ ID 726>: 

1 . . Y PVLFVARFL PGLRTAVFV T AGISRKVSYL RFLIMDGLAA LISVPVWI YL 
51 GEYGAHNIDW LMAKMHSLQ S GIFIALGVLA AALAWFW WRK RRHYQLYRAQ 
101 LSEKRAKRKA EKAAKKAAQK QQ* 

Further work revealed the complete gonococcal nucleotide sequence <SEQ ID 727>: 



1 atgtttgccc tttTggaagc CTTTTTTGTC GAAtacggCt atgcGGCCGT 

51 GTTTTTCGTT TTGGTCATCT GCGGTTTCGG CGTGCCGATT CCCGAAGATT 

101 TGACCTTGGT AACGGGCGGC GTGATTTCGG GTATGGGTTA TACCAATCCG 

151 CATATTATGT TTGCGGTCGG TATGCTCGGC GTGTTGGCGG GCGACGGCGT 

201 GATGTTTGCC GCCGGACGCA TCTGGGGGCA GAAAATCCTC AAGTTCAAAC 

251 CGATTGCGCG CATCATGACG CCGAAACGTT ACGCGCAGGT TCAGGAAAAA 

301 TTCGACAAAT ACGGCAACTG GGTTCTGTTT GTCGCCCGTT TCCTGCCGGG 
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351 TTTGCGGACT GCCGTTTTCG TTACCGCCGG CATCAGCCGC AAAGTATCGT 

401 ATCTGCGCTT TCTGATTATG GACGGGCTGG CCGCGCTGAT TTCCGTGCCC 

451 GTTTGGATTT ACTTGGGCGA GTACGGCGCG CACAACATCG ATTGGCTGAT 

501 GGCGAAAATG CACAGCCTGC AATCGGGCAT CTTCATCGCA TTGGGCGTGC 

551 TGGCGGCGGC GCTGGCGTGG TTCTGGTGGC GCAAACGCCG ACATTATCAG 

601 CTTTACCGCG CACAATTGAG CGAAAAACGC GCCAAACGCA AGGCGGAAAA 

651 GGCAGCGAAA AAAGCGGCAC AGAAGCAGCA GTAa 

This corresponds to the amino acid sequence <SEQ ID 728; ORF78ng-l>: 

1 MFALLEAFFV EYG YAAVFFV LVICGFGVPI PEDLTLVTGG VISGMGYTNP 

51 H IMFAVGMLG VLAGDGVM FA AGRIWGQKIL KFKPIARIMT PKRYAQVQEK 

101 FDKYGNW VLF VARFLPGLRT AVFVT AGISR KVSYL RFLIM DGLAALISVP 

151 VWIYLGEYGA HNIDWLMAKM HSL QSGIFIA LGVLAAALAW F WWRKRRHYQ 

201 LYRAQLSEKR AKRKAEKAAK KAAQKQQ* 

ORF78ng-l and ORF78-1 show 88.1% identity in 227 aa overlap: 



10 20 30 40 50 60 

orf 78-1 . pep MFAFLEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGMGYTNPHIMFAVGMLG 
1 1 I : I 1 1 I 1 1 I 1 1 1 1 1 I I 1 1 I 1 1 1 1 1 1 1 II 1 1 I II I I I I I I I I 1 1 I I I I I 1 1 I 1 1 1 1 1 1 I 
orf78ng-l MFALLEAFFVE YGYAAVFFVLVI CGFGVPI PE DLTLVTGGVI SGMG YTN PHIMFAVGMLG 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 78-1. pep VLVGDGIMFAAGRIWGQKILRFKPIARIMTPKRYEQVQEKFDKYGNWVLFVARFLPGLRT 

I 1 : I I I : I I I I I I I I I I I I 1 : I I i I M I I I I I M II I I I I I I II I ! I I I II II I I I I II 
orf78ng-l VLAGDGVMFAAGRIWGQKILKFKPIARIMTPKRYAQVQEKFDKYGNWVLFVARFLPGLRT 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 78-1 . pep AVFVTAG I SRKVS YLRFI IMDGLAALI SVP IWI YLGEYGAHNI DWLMAKMHSLQSGI FVI 

II Ml II 1111111111:111 llllllll 1:111111 III II I MM II lllll Nil: 
orf78ng-l AVFVTAGISRKVSYLRFLIMDGLAALISVPVWIYLGEYGAHNIDWLMAKMHSLQSGIFIA 

130 140 150 160 170 180 

190 200 210 220 

orf 78-1 . pep LG I G AT WAW I WWKKRQR I QFYRSKLKEKRAQRKAAKAAKKAAQSKQX 
II: I:::ll:ll:l|:: I : I I : : I : II I I : I I I }|Milil::ll 
0rf78ng-l LGVLAAALAW FWWRKRRHYQLYRAQLSEKRAKRKAEKAAKKAAQKQQX 

190 200 210 220 

Furthermore, orf78ng-l shows homology to the dedA protein from HAnfluenzae: 



sp|P45280|YG29_HAEIN HYPOTHETICAL PROTEIN HI1629 >gi 1 1073983 I pir M D64133 dedA 
protein (dedA) homolog - Haemophilus influenzae (strain Rd KW20) 
>gi 1 1574476 (U32836) dedA protein (dedA) [Haemophilus influenzae] Length = 212 
Score - 223 bits (563), Expect = 7e-58 

Identities - 108/182 (59%), Positives = 140/182 (76%), Gaps - 2/182 (1%) 

Query: 5 LEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGM — GYTNPHIMFAVGMLGVL 62 

L FF EYGY AV FVL+ICGFGVPIPED+TLV+GGVI+G+ N H+M V M+GVL 

Sbjct: 21 LIGFFTEYGYWAVLFVLIICGFGVPIPEDITLVSGGVIAGLYPENVNSHLMLLVSMIGVL 80 

Query: 63 AGDG VMFAAGRI WGQK I LKFKPI ARIMT PKRYAQVQEKFDKYGNWVLFVARFLPGLRTAV 122 

AGO M+ GRI+G KIL+F+PI RI+T +R V+EKF +YGN VLFVARFLPGLR + 
Sbjct: 81 AGDSCMYWLGRIYGTKILRFRPIRRIVTLQRLRMVREKFSQYGNRVLFVARFLPGLRAPI 140 

Query: 123 FVTAGISRKVSYLRFLIMDGIAALISVPVWIYLGEYGAHNIDWLMAKMHSLQSGIFIALG 182 

++ +GI+R+VSY+RF+++D AA+ISVP+WIYLGE GA N+DWL ++ Q I+I +G 
Sbjct: 141 YMVSGITRRVSYVRFVLIDFCAAIISVPIWIYLGELGAKNLDWLHTQIQKGQIVIYIFIG 200 

Query: 183 VL 184 
L 

Sbjct: 201 YL 202 
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Based on this analysis, including the presence of putative transmembrane domains, it is predicted 
that these proteins from N. meningitidis and ^.gonorrhoeae^ and their epitopes, could be useful 
antigens for vaccines or diagnostics, or for raising antibodies. 



Example 87 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 729>: 

1 ATGAAAAAAT TATTGGCGGC CGTGATGATG GCAGGTTTGG CAGGCGCGGT 

51 TTCCGCCGCC GGAGTCCACG TTGAGGACGG CTGGGCGCGC ACCACCGTCG 

101 AAGGTATGAA AATAGGCGGC GCGTTCATGA AAATCCACAA CGACGAAGCC 

151 AAACAAGACT TTTTGCTCGG CGGAAGCAGC CCCGTTGCCG ACCGCGTCGA 

201 AGTGCATACC CACATCAACG ACAACGGCGT GATGCGGATG CGCGAAGTCG 

251 AAGGCGGCGT GCCTTTGGAA GCGAAATCCG TTACCGAACT CAAACCCGGC 

301 AGCTATCATG TGATGTTTAT GGGTTTGAAA AAACAATTAA AAGAGGGCGA 

351 TAAAATTCCC GTTACCCTGA AATTTAAAAA CGCCAAAGCG CAAACCGTCC 

401 AACTGGAAGT CAAAATCGCG CCGATGCCGG CAATGAACCA C... 

This corresponds to the amino acid sequence <SEQ ID 730; ORF79>: 



1 MKKLLAAVMM AGLAGA VSAA GVHVEDGWAR TTVEGMKIGG AFMKIHNDEA 
51 KQDFLLGGSS PVADRVEVHT HINDNGVMRM REVEGGVPLE AKSVTELKPG 
101 SYHVMFMGLK KQLKEGDKIP VTLKFKNAKA QTVQLEVKIA PMPAMNH. . 

Further work revealed the complete nucleotide sequence <SEQ ID 73 1>: 



1 ATGAAAAAAT TATTGGCGGC CGTGATGATG GCAGGTTTGG CAGGCGCGGT 

51 TTCCGCCGCC GGAGTCCACG TTGAGGACGG CTGGGCGCGC ACCACCGTCG 

101 AAGGTATGAA AATAGGCGGC GCGTTCATGA AAATCCACAA CGACGAAGCC I 

151 AAACAAGACT TTTTGCTCGG CGGAAGCAGC CCCGTTGCCG ACCGCGTCGA 

201 AGTGCATACC CACATCAACG ACAACGGCGT GATGCGGATG CGCGAAGTCG 

251 AAGGCGGCGT GCCTTTGGAA GCGAAATCCG TTACCGAACT CAAACCCGGC 

301 AGCTATCATG TGATGTTTAT GGGTTTGAAA AAACAATTAA AAGAGGGCGA 

351 TAAAATTCCC GTTACCCTGA AATTTAAAAA CGCCAAAGCG CAAACCGTCC 

401 AACTGGAAGT CAAAATCGCG CCGATGCCGG CAATGAACCA CGGTCATCAC 

451 CACGGCGAAG CGCATCAGCA CTAA 

This corresponds to the amino acid sequence <SEQ ID 732; ORF79-l>: 

1 MKKLLAAVMM AGLAGA VSAA GVHVEDGWAR TTVEGMKIGG AFMKIHNDEA 
51 KQDFLLGGSS PVADRVEVHT HINDNGVMRM REVEGGVPLE AKSVTELKPG 
101 SYHVMFMGLK KQLKEGDKIP VTLKFKNAKA QTVQLEVKIA PMPAMNHGHH 
151 HGEAHQH* 

Computer analysis of this amino acid sequence revealed a putative leader peptide and also gave the 
following results: 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF79 shows 94.6% identity over a 147aa overlap with an ORF (ORF79a) from strain A of N. 
meningitidis: 



10 20 30 40 50 60 

orf 7 9 . pep MKKLLAAVMMAGIxAGAVSAAGVHVEDGWARTTVEGMKIGGAFMKIHNDEAKQDFLLGGSS 
It I II I I I I I I I I I I I I I I I: I I I I I I I I I I I I I I I :l I I I I I I I I I I I II I I I I I I I I 
or f 7 9a MKXLLAAVMMAGLAG AV S AAG I HVE DGWARTT VEGMKMGG AFMKI HN DEAKQD FLLGGS S 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 7 9 . pep PVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGSYHVMFMGLKKQLKEGDKIP 

1 1 1 1 1 1 1 f 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 i 1 1 1 1 f 1 1 1 1 1 inn urn 

or f 7 9a PVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGSYHVMFMGXKKQLKXGDKIP 
70 80 90 100 110 120 
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130 140 
orf 7 9 . pep VTLKFKNAKAQTVQLEVKIAPMPAMNH 
II I I I I I I I I I I I I I I I I III ||:| 
or f 7 9a VTLKFKNAKAQTVQLEVKTAPMSAMDHGHHHGEAHQHX 

130 140 150 

The complete length ORF79a nucleotide sequence <SEQ ID 733> is: 

1 ATGAAANAAC TATTGGCAGC CGTGATGATG GCAGGTTTGG CAGGCGCGGT 

51 TTCCGCCGCC GGAATCCACG TTGAGGACGG CTGGGCGCGC ACCACCGTCG 

101 AAGGTATGAA AATGGGCGGC GCGTTGATGA AAATCCACAA CGACGAAGCC 

151 AAACAAGACT TTTTGCTCGG CGGAAGCAGC CCTGTTGCCG ACCGCGTCGA 

201 AGTGCATACC CATATCAATG ATAACGGTGT GATGCGGATG CGCGAAGTCG 

251 AAGGCGGCGT GCCTTTGGAG GCGAAATCCG TTACCGAACT CAAACCCGGC 

301 AGCTATCATG TCATGTTTAT GGGTNTGAAA AAACAATTAA AAGANGGCGA 

351 CAAGATTCCC GTTACCCTGA AATTTAAAAA CGCCAAAGCA CAAACCGTCC 

401 AACTGGAAGT CAAAACCGCG CCGATGTCGG CAATGGACCA CGGTCATCAC 

451 CACGGCGAAG CGCATCAGCA CTAA 

This encodes a protein having amino acid sequence <SEQ ID 734>: 

1 MKXLLAAVMM AGLAGA VSAA GIHVEDGWAR TTVEGMKMGG AFMKIHNDEA 
51 KQDFLLGGSS PVADRVEVHT HINDNGVMRM REVEGGVPLE AKSVTELKPG 
101 SYHVMFMGXK KQLKXGDKIP VTLKFKNAKA QTVQLEVKTA PMSAMDHGHH 
151 HGEAHQH* 

ORF79a and ORF79-1 show 94.9% identity in 157 aa overlap: 

10 20 30 40 50 60 

or f 7 9a . pep MKXLLAAVMMAGIAGAVSAAGIHVEDGWARTTVEGMKMGGAFMKIHNDEAKQDFLLGGSS 
M M II I Ml! II I IMI 11:1 IN IIMI IIIMCI I I I i I I I M I I I It I I I I I t I 
orf 7 9-1 MKKLIAAVMMAGLAGAVSAAGVHVEDGWARTTVEGMKIGGAFMKIHNDEAKQDFLLGGSS 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 7 9a . pep PVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGSYHVMFMGXKKQLKXGDKIP 
IN II Mill I I II III I I MM MM MINIM IMI I II MINI Mill IIMI 
orf 7 9-1 PVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGSYHVMFMGLKKQLKEGDKIP 

70 80 90 100 110 120 



130 140 150 

orf 7 9a . pep VTLKFKNAKAQTVQLEVKTAPMSAMDHGHHHGEAHQHX 
MliiMMMMIMM III I I : I I I I I I I I I I I t 
orf 79-1 VTLKFKNAKAQTVQLEVKIAPMPAMNHGHHHGEAHQHX 

130 140 150 



Homology with a predicted ORF from N.eonorrhoeae 

ORF79 shows 96.1% identity over 76 aa overlap with a predicted ORF (ORF79ng) from 
N.gonorrhoeae: 

orf 7 9 . pep FMKIHNDEAKQDFLLGGSSPVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGS 101 

II II I M II II I : I I I II I II I II I I II II 
orf79ng INDNGVMRMREVKGGVPLEAKSVTELKPGS 30 

orf 79. pep YHVMFMGLKKQLKEGDKI PVTLKFKNAKAQTVQLEVKI APMPAMNH 147 

MMIIMIMIIIIIMMIMIMIIMMMIM III MM 
orf79ng YHVMFMGLKKQLKEGDKI PVTLKFKNAKAQTVQLEVKTAPMSAMNHGHHHGEAHQH 86 

An ORF79ng nucleotide sequence <SEQ ID 735> was predicted to encode a protein comprising 
amino acid sequence <SEQ ID 736>: 

1 . . INDNGVMRMR EVKGGVPLEA KSVTELKPGS YHVMFMGLKK QLKEGDKIPV 
51 TLKFKNAKAQ TVQLEVKTAP MSAMNHGHHH GEAHQH* 



Further work revealed the complete gonococcal DNA sequence <SEQ ID 737>: 
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1 ATGAAAAAAT TATTGGCAGC CGTGATGATG GCAGGTTTGG CAGGCGCGGT 

51 TTccgccgCc GGagTccAtG TCGAggACGG CTGGGCGCGc accaCTGtcg 

101 aaggtATgaa aatggGCGGC GCgttCATga aaATCCACAA CGACGaaGcc 

151 atacaaGACt ttgtgcTCgg CGGaagcatg cccgttgccg accgcGTCGA 

5 201 AGTGCAtaca cacATCAACG ACAACGGCGT GATGCGTATG CGCGAAGTCA 

251 AAGGCGGCGT GCCTTTGGAG GCGAAATCCG TTACCGAACT CAAACCCGGC 

301 AGCTATCACG TGATGTTTAT GGGTTTGAAA AAACAACTGA AAGAGGGCGA 

351 CAAGATTCCC GTTACCCTGA AATTTAAAAA CGCCAAAGCG CAAACCGTCC 

401 AACTGGAAGT CAAAACCGCG CCGATGTCGG CAATGAACCA CGGTCATCAC 

10 451 CACGGCGAAG CGCATCAGCA CTAA 

This corresponds to the amino acid sequence <SEQ ID 738; ORF79ng-l>: 

1 MKKLLAAVMM AGLAGAV SAA GVHVEDGWAR TTVEGMKMGG AFMKIHNDEA 

51 IQDFVLGGSM PVADRVEVHT HINDNGVMRM REVKGGVPLE AKSVTELKPG 

101 SYHVMFMGLK KQLKEGDKIP VTLKFKNAKA QTVQLEVKTA PMSAMNHGHH 

15 151 HGEAHQH* 

ORF79ng-l and ORF79-1 show 95.5% identity in 157 aa overlap: 

10 20 30 40 50 60 

orf 7 9-1 . pep MKKLLAAVMMAGLAGAVSAAGVHVEDGWARTTVEGMKI GGAFMKIHN DEAKQDFLLGGS S 
M I I I I I I 1 1 II I I I I 1 I I I I I I I 1 I I I I I I ! I I I I I : I I I I I I I I I | | I 111:1111 
20 or f 7 9ng- 1 MKKLLAAVMMAGIAGAVSAAGVHVEDGWARTTVEGMKMGGAFMKIHNDEAIQDFVLGGSM 

10 20 30 40 50 60 

70 80 90 100 110 ! 120 

or f 7 9-1 . pep PVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGSYHVMFMGLKKQLKEGDKIP 
25 I I I I I I I I I I I I I I I I M I I M I : I I I I I I M I I I I I I I I II I I M I M I I I I I I I I I I I 

orf79ng-l PVADRVEVHTHINDNGVMRMREVKGGVPLEAKSVTELKPGSYHVMFMGLKKQLKEGDKIP 

70 80 90 100 110 120 j 

130 140 150 

30 orf 7 9-1. pep VTLKFKNAKAQTVQLEVKIAPMPAMNHGHHHGEAHQHX I 

I I I I I I I M I I I I I I I I I III I I M I I I I I I I I I I I 
or f 7 9ng- 1 VTLKFKNAKAQTVQLEVKTAPMSAMNHGHHHGEAHQHX 
130 140 150 

Furthermore, ORF79ng-l shows significant homology to a protein from Aquifex aeolicus: 

35 gi | 2983695 (AE000731) putative protein [Aquifex aeolicus] Length = 151 

Score = 63.6 bits (152), Expect = 6e-10 

Identities = 38/114 (33%), Positives = 58/114 (50%), Gaps = 1/114 (0%) 

Query: 24 VEDGWARTTVEGMKMGGAFMKIHNDEAIQDFVLGGSMPVADRVEVHTHINDNGVMRMREV 83 
40 V+ W G M I N+ D+++G +A RVE+H + +N V +M 

Sbjct: 27 VKHPWVMEPPPGPNTTMMGMIIVNEGDEPDYLIGAKTDIAQRVELHKTVIENDVAKMVPQ 86 

Query: 84 KGGVPLEAKSVTELKPGSYHVMFMGLKKQLKEGDKIPVTLKFKNAKAQTVQLEV 137 
+ + + K E K YHVM +GLKK++KEGDK+ V h F+ + TV+ V 
45 Sbjct: 87 ER-IEIPPKGKVEFKHHGYHVMIIGLKKRIKEGDKVKVELIFEKSGKITVEAPV 139 

Based on this analysis, it is predicted that the proteins from N. meningitidis and N.gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF79-1 (15.6kDa) was cloned in the pET vector and expressed in E.coli, as described above. The 
50 products of protein expression and purification were analyzed by SDS-PAGE. Figure 1 8A shows 
the results of affinity purification of the His-fusion protein. Purified His-fusion protein was used 
to immunise mice, whose sera were used for ELISA (positive result) and FACS analysis (Figure 
18B) These experiments confirm that ORF79-1 is a surface-exposed protein, and that it is a useful 
immunogen. 
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Example 88 

The following DNA sequence, believed to be complete, was identified in N.meningitidis <SEQ ID 
739>: 

1 ATGACGGTAA CTGCGGCCGA AGGCGGCAAA GCTGCCAAGG CGTTAAAAAA 

, 51 ATATCTGATT ACGGGCATTT TGGTCTGGCT GCCGATTGCG GTAACGGTTT 

101 GGGTGGTTTC CTATATCGTT TCCGCGTCCG ATCAGCTCGT CAACCTGCTG 

151 CCGAAGCAAT GGCGGCCGCA ATATGTTTTG GGGTTTAATA TCCCGGGGCT 

201 GGGCGTTATC GTTGCCATTG CCGTATTGTT TGTAACCGGA TTGTTTGCCG 

251 CCAACGTATT GGGTCGGCAG ATCCTCGCCG CGTGGGACAG CCTGTTGGGG 

301 CGGATTCCGG TTGTGAAAtC CATCTATTCG AGTGTGAAAA AAGTATCCGA 

351. ATacgTGCTG TCCGACAGCA GCCGTTCGTT TAAAACGCCG GTACTCGTGC 

401 CGTTTCCCCA GCCCGGTATT TGGACGATyG CTTTCGTGTC AGGGCAGGTG 

451 TCGAATGCGG TTAAGGCCGC ATTGCCGAAs GACGGCGATT ATCTTTCCGT 

501 GTATGTTCCG ACCACGCCGA ATCCGACCGG CGGTTACTAT ATTATGGTAA 

551 AGAAAAGCGA TGTGCGCGAA CTCGATATGA GCGTGGACGA AsCATTGAAA 

601 TATGTGATTT CGCTGGGTAT GGTCATCCCT GACGACCTGC CCGTCAAAAC 

651 ATTGGCAsGA CCTATGCCGT CTGAAAAGGC GGATTTGCCC GAACAACAAT 

701 AA 

This corresponds to the amino acid sequence <SEQ ID 740; ORF98>: 

1 MTVTAAEGGK AAKALKKYLI TGILVWLPIA VTVWWSYIV SASDQLVNLL 

51 PKQWRPQYVL GFNIPGLGVI VAIAVLFVTG LFAANVLGRQ ILAAWDSLLG 

101 RIPWKSIYS SVKKVSEYVL SDSSRSFKTP VLVPFPQPGI WTIAFVSGQV 

151 SNAVKAALPX DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEXLK 

201 YVISLGMVIP DDLPVKTLAX PMPSEKADLP EQQ* 

Further work revealed the complete nucleotide sequence <SEQ ID 741>: 

1 ATGACGGAAC nTGCGGCCGA AGGCGGCAAA GCTGCCAArG CGTTAAAAAA 

51 ATATCTGATT ACGGGCATTT TGGTCTGGCT GCCGATTGCG GTAACGGTTT 

101 GGGTGGTTTC CTATATCGTT TCCGCGTCCG ATCAGCTCGT CAACCTGCTG 

151 CCGAAGCAAT GGCGGCCGCA ATATGTTTTG GGGTTTAATA TCCCGGGGCT 

201 GGGCGTTATC GTTGCCATTG CCGTATTGTT TGTAACCGGA TTGTTTGCCG 

251 CCAACGTATT GGGTCGGCAG ATCCTCGCCG CGTGGGACAG CCTGTTGGGG 

301 CGGATTCCGG TTGTGAAATC CATCTATTCG AGTGTGAAAA AAGTATCCGA 

351 ATCGCTGCTG TCCGACAGCA GCCGTTCGTT TAAAACGCCG GTACTCGTGC 

401 CGTTTCCCCA GCCCGGTATT TGGACGATTG CTTTCGTGTC AGGGCAGGTG 

451 TCGAATGCGG TTAAGGCCGC ATTGCCGAAG GACGGCGATT ATCTTTCCGT 

501 GTATGTTCCG ACCACGCCGA ATCCGACCGG CGGTTACTAT ATTATGGTAA 

551 AGAAAAGCGA TGTGCGCGAA CTCGATATGA GCGTGGACGA AGCATTGAAA 

601 TATGTGATTT CGCTGGGTAT GGTCATCCCT GACGACCTGC CCGTCAAAAC 

651 ATTGGCAGGA CCTATGCCGT CTGAAAAGGC GGATTTGCCC GAACAACAAT 

701 AA 

This corresponds to the amino acid sequence <SEQ ID 742; ORF98-l>: 

1 MTEXAAEGGK AAKALKKY LI TGILVWLPIA VTVWW SYIV SASDQLVNLL 

51 PKQWRPQYVL GFNIPG LGVI VAIAVLFVTG LFA ANVLGRQ ILAAWDSLLG 

101 RIPWKSIYS SVKKVSESLL SDSSRSFKTP VLVPFPQPGI WTIAFVSGQV 

151 SNAVKAALPK DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEALK 

201 YVISLGMVIP DDLPVKTLAG PMPSEKADLP EQQ* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N.meninzitidis (strain A) 

ORF98 shows 96.1% identity over a 233aa overlap with an ORF (ORF98a) from strain A of TV. 
meningitidis: 

10 20 30 40 50 60 

orf 98 . pep MTVTAAEGGKAAKALKKYLITGILWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 
M M I I I I I I I M I I I I I I I I | M M M I II I I I I I I I I I I I I I I M M I I ! I I | i | | 
orf 9 8a MTEPAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 

10 20. 30 40 50 60 
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70 80 90 100 110 120 

orf 98 . pep GFNIPGLGVIVAIAVLFVTGLFAANVIXSRQILAAWDSLI^RIPVVKSIYSSVKKVSEYVL 
I I I I I I I II I I I I I I t I I I I II I I I I I I I I I I I I I I I 11 I I I I II I I I I I I I II I I : I 
orf 98a GE^IPGLGVIVAIAVLEVTGLFAANVLGRQII^WDSIJLGRIPVVKSIYSSVKKVSXSLL 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 98 . pep SDSSRSFKTPVLVPFPQPGIWTIAFVSGQVSNAVKAALPXDGDYLSVYVPTTPNPTGGYY 
I I I I I I I I I II I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I M I I I I I I II 
orf 98a SDSSRSFKTPVLVPFPQSGIWTIAFVSGQVSNAVKAALPKDGDYLSVYVPTTPNPTGGYY 

130 140 150 160 170 180 

190 200 210 220 230 

orf 98 . pep IMVKKSDVRELDMSVDEXLKYVISLGMVIPDDLPVKTLAXPMPSEKADLPEQQX 
I II II I III I II Mill ! I II I I 111 I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 98a IMVKKSDVRELDMSVDEALKYVISLGMVIPDDLPVKTLAGPMPSEKADLPEQQX 

190 200 210 220 230 

The complete length ORF98a nucleotide sequence <SEQ ID 743> is: 

1 ATGACGGAAC CTGCGGCCGA AGGCGGCAAA GCTGCCAAGG CGTTAAAAAA 

51 ATATCTGATT ACGGGCATTT TGGTCTGGCT GCCGATTGCG GTAACGGTTT 

101 GGGTGGTTTC CTATATCGTT TCCGCGTCCG ATCAGCTCGT CAACCTGCTG 

151 CCGAAGCAAT GGCGGCCGCA ATATGTTTTG GGGTTTAATA TCCCGGGGCT 

201 GGGCGTTATC GTTGCCATTG CCGTATTGTT TGTAACCGGA TTATTTGCCG 

251 CAAACGTATT GGGCCGGCAG ATTCTTGCCG CGTGGGACAG CTTGTTGGGG ' 

301 CGGATTCCGG TTGTGAAGTC CATCTATTCG AGTGTGAAAA AAGTATCCGA 

351 NTCGTTGCTG TCCGACAGCA GCCGTTCGTT TAAAACACCA GTACTCGTGC 

401 CGTTTCCCCA ATCGGGTATT TGGACAATCG CATTCGTGTC CGGTCAGGTG 

451 TCGAATGCGG TTAAGGCCGC ATTGCCGAAG GACGGCGATT ATCTTTCCGT 

501 GTATGTTCCG ACCACGCCGA ATCCGACCGG CGGTTACTAT ATTATGGTAA 

551 AGAAAAGCGA TGTGCGCGAA CTCGATATGA GCGTGGACGA AGCGTTGAAA 

601 TATGTGATTT CGCTGGGTAT GGTCATCCCT GACGACCTGC CCGTCAAAAC 

651 ATTGGCAGGA CCTATGCCGT CTGAAAAGGC GGATTTGCCC GAACAACAAT 

701 AA 

This encodes a protein having amino acid sequence <SEQ ID 744>: 

1 MTEPAAEGGK AAKALKKY LI TGILVWLPIA VTVWW SYIV SASDQLVNLL 

51 PKQWRPQYVL GFNIPG LGVI VAIAVLFVTG LFAA NVLGRQ ILAAWDSLLG 

101 RIPWKSIYS SVKKVSXSLL SDSSRSFKTP VLVPFPQSGI WTIAFVSGQV 

151 SNAVKAALPK DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEALK 

201 YVISLGMVIP DDLPVKTLAG PMPSEKADLP EQQ* 

ORF98a and ORF98-1 show 98.7% identity in 233 aa overlap: 

10 20 30 40 50 60 

orf 98a. pep MTEPAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 
III IIIIMIIIMMIMIIIIIIIIIIIIIIIIMIIIIMIIIIIIIIIIIIIIII 
orf 98-1 MTEXAAEGGKAAKALKKYLITGILWLPIAVTVWVVSYIVSASDQLVNLLPKQWRPQYVL 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 98a . pep GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKKVSXSLL 
I II I I I I II II I II I I I I I I I II I I II I I I II I I I I I I I II 1 I I I I I I I I I I I I II 111 
orf 98-1 GFNIPGLGVIVAIAVLFVTGLFAANVLGRQItiAAWDSLLGRIPWKSIYSSVKKVSESLL 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 98a . pep SDSSRSFKTPVLVPFPQSGIWTIAFVSGQVSNAVKAALPKDGDYLSVYVPTTPNPTGGYY 
I I I I I I I I I I I I I I I I I I I I I I 1 I I I II II I I I I I I I I II I I I I I I I I I I I I I I I I I I I 
orf 98-1 SDSSRSFKTPVLVPFPQPGIWTIAFVSGQVSNAVKAALPKDGDYLSVYVPTTPNPTGGYY 

130 140 150 160 170 180 

190 200 210 220 230 

orf 98a . pep IMVKKSDVRELDMSVDEALKYVISLGMVIPDDLPVKTLAGPMPSEKADLPEQQX 
I I I I I I I I I I I I I I I I I I II I I I I I I I II I I I I I I I I II I I III II I I I I I I 11 
o r f 9 8 - 1 IMVKKS DVRELDMSVDEALKYVI S LGMVI PDDLPVKTLAGPMPSEKADLPEQQX 

190 200 210 220 230 
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Homologv with a predicted ORF from N. gonorrhoeae 

ORF98 shows 95.3% identity over a 233 aa overlap with a predicted ORF (ORF98ng) from 
N. gonorrhoeae: 

10 20 30 40 50 60 

5 or f 98 . pep MT VTAAEGGKAAKALKKYL ITG I LVWL P I AVTVWWS Y I VS ASDQLVNLLPKQWRPQYVL 60 

M 1 I I I I I I I M I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I | I I I I I 
orf98ng MTEPAAEGGKAAKALKKYLITGILVWLPIAVTVWVVSYIVSASDQLVNLLPKQWRPQYVL 60 

orf 98 .pep GFNIPGLGVIVAIAVLFVTGLFAANVIXSRQIIJ^WDSLI^RIPVVKSIYSSVKKVSEYVL 120 
10 II MINIM II Mil II III II III! Ill I II Mill I I I II I MM ! I Mil I I :| 

orf98ng GFNIPGLGVIVAIAVLFVTGLFAANVIX2RQILAAWDSLLXRIPWKSIYSSVKKVSESLL 120 

orf 98 .pep SDSSRSFKTPVLVPFPQPGIWTIAFVSGQVSNAVKAALPXDGDYLSVYVPTTPNPTGGYY 180 
M 1 1 I 1 1 M I I II M M I M I I II II Ml I I I M I II I II I I I I I I I I I I II I I II I I 
15 orf98ng SDSSRSFKTPVLVPFPQSGIWTIAFVSGQVSNAVKAALPQDGDYLSVYVPTTPNPTGGYY 180 

orf 98 .pep IMVKKS DVRELDMSVDEXLKYVI SLGMVI PDDLPVKTLAXPMPSEKADLPEQQ 233 

M II I f I I II I II M II I I I I I I I I 1 I K I I I I I I I 1 1 I III 111:1 MM 
orf98ng IMVKKS DVRELDMSVDEALKYVI SLGMVI PDDLPVKTLAGPMPPEKAELPEQQ 233 

20 The complete length ORF98ng nucleotide sequence <SEQ ID 745> is predicted to encode a protein 
having amino acid sequence <SEQ ID 746>: 

1 MTEPAAEGGK AAKALKKYL I TGILVWLPIA VTVWW SYIV SASDQLVNLL 

51 PKQWRPQYVL GFNIPG LGVI VAIAVLFVTG LFA ANVLGRQ ILAAWDSLLX 

101 RIPWKSIYS SVKKVSESLL SDSSRSFKTP VLVPFPQSGI WTIAFVSGQV 

25 151 SNAVKAALPQ DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEALK 

201 YVI SLGMVI P DDLPVKTLAG PMPPEKAELP EQQ* 

Further work revealed the complete nucleotide sequence <SEQ ID 747>: 

1 ATGACGGAAC CTGCGGCCGA AGGCGGCAAA GCTGCCAAGG CGTTAAAAAA 

51 ATATCTGATT ACAGGCATTT TGGTCTGGCT GCCGATTGCG GTAACGGTTT 

30 101 GGGTGGTTTC CTATATCGTT TCCGCGTCCG ACCAGCTTGT CAACCTGCTG 

151 CCGAAGCAAT GGCGGCCGCA ATATGTTTTG GGGTTTAATA TCCCCGGGCT 

201 CGGCGTTATT GTTGCCATTG CCGTATTGTT TGTAACCGGA TTATTTGCCG 

251 CAAACGTGTT GGGCCGGCAG ATTCTTGCCG CGTGGGACAG CCTGTTgggg 

301 cggaTTCCGG TTGTCAAATC CATCTATTCG AGTGTGAAAA AAGTATCCGA 

35 351 ATCGCTGCTG TCCGACAGCA GCCGTTCGTT TAAAACGCCG GTACTCGTGC 

401 CGTTTCCCCA ATCGGGTATT TGGACAATCG CATTCGTGTC CGGTCAGGTG 

451 TCGAATGCGG TTAAGGCCGC ATTGCCGCAG GATGGCGATT ATCTTTCCGT 

501 GTATGTCCCG ACCACGCCCA ACCCGACCGG CGGTTACTAT ATTATGGTAA 

551 AGAAAAGCGA TGTGCGCGAA CTCGATATGA GCGTGGACGA AGCGTTGAAA 

40 601 TATGTGATTT CGCTGGGTAT GGTCATCCCT GACGACCTGC CCGTCAAAAC 

651 ATTGGCAGGA CCTATGCCGC CTGAAAAGGC GGAGTTGCCC GAACAACAAT 

701 AA 

This corresponds to the amino acid sequence <SEQ ID 748; ORF98ng-l>: 

1 MTEPAAEGGK AAKALKKY LI TGILVWLPIA VTVWW SYIV SASDQLVNLL 

45 51 PKQWRPQYVL GFNIPG LGVI VAIAVLFVTG LFAA NVLGRQ ILAAWDSLLG 

101 RIPWKSIYS SVKKVSESLL SDSSRSFKTP VLVPFPQSGI WTIAFVSGQV 

151 SNAVKAALPQ DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEALK 

201 YVISLGMVIP DDLPVKTLAG PMPPEKAELP EQQ* 

ORF98ng-l and ORF98-1 show 97.9% identity in 233 aa overlap: 

50 10 20 30 40 50 60 

or f 98-1 . pep MTEXAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 
Ml I I I I I I I I I I I I )( t II I I I I I I I I I I I I I 1 I I i I t t II I 1 ! I I II I I I II 1 i I i I 
orf98ng-l MTEPAAEGGKAAKALKKYLITGILVWLPI AVTVWWS YIVS ASDQLVNLLPKQWRPQYVL 

10 20 30 40 50 60 



55 



70 80 90 100 110 120 

or f 98-1 . pep GFN I PGLGV I VAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKS I YS SVKKVSESLL 
I M M I I 1 11 I II I I M I I I I I I | M 1 1 II I II M M I II I I II M I I I I I I I M I M M 
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orf98ng-l GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKKVSESLL 

70 80 90 100 110 120 

130 140 150 160 170 180 

or f 98-1 • pep SDSSRSFKTPVLVPFPQPGIWTIAFVSGQVSNAVKAALPKDGDYLSVYVPTTPNPTGGYY 
II! II II I II M Ml II I II II MM I IIIMII I M I: I I II MINIMI lllll I I 
orf98ng-l SDSSRSFKTPVLVPFPQSGIWTIAFVSGQVSNAVKAALPQDGDYLSVYVPTTPNPTGGYY 
130 140 150 160 170 180 

190 200 210 220 230 

or f 98 -1 . pep IMVKKSDVRELDMSVDEALKYVISLGMVIPDDLPVKTLAGPMPSEKADLPEQQX 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I i I II M I I I M I I I : I I I I I I 
orf98ng-l IMVKKSDVRELDMSVDEALKYVISLGMVIPDDLPVKTLAGPMPPEKAELPEQQX 
190 200 210 220 230 

Based on this analysis, including the fact that the putative transmembrane domains in the 
gonococcal protein are identical to the sequences in the meningococcal protein, it is predicted that 
the proteins from Kmeningitidis and K gonorrhoeae, and their epitopes, could be useful antigens 
for vaccines or diagnostics, or for raising antibodies. 



Example 89 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 749>: 

1 ATgAAAACGG TAGTCTGGAT TGTCGTCCTG TTTGCCGCCG CCGTCGGACT 

51 GGCGCTGGCT TCGGGCATTT ACACCGGCGA CGTGTATATC GTACTCGGAC 

101 AGACCATGCT CAGAATCAAC CTGCACGCCT TTGTGTTAGG TTCGCTGATT 

151 GCCGTCGTGG TGTGGTATTT CTTGTTTAAA TTCATTATCG GsGgTACTCA 

201 ATATCCCCGA AAAGATGCAG CGTTTCGGTT CGGCnCGTAA AGGCCkCAAG 

251 ssCGsGCTTG CCTTGAACAA GGCGGGTTTG GCGTATTTTG AAGGGCGTTT 

301 TGAAAAGGCG GAACTAGAAG CCTCACGCGT GTTGGTCAAC AAAGtAGGCC 

351 G^gAGACAAC CGGACTTTGG CATTGATGCT GrGCGCGCAC GCCGCCGGAC 

401 AGATGGAAAA CATCGAssTG CGCGACCGTT ATCTTGCGGA AATCGCCAAA 

451 CTGCCGGAAA AACAGCAGCT TTCCCGTTAT CTTTTGTTGG CGGAATCGGC 

501 GTTGAACCGG CGCGATTACG AAGCGGCGGA AGCCAATCTT CATGCGGCGG 

551 CGAAGATGAA TGCCAACCTT ACGCGCCTCG TGCGTCTGCA . ATTCGTTAC 

601 GCTTTCGACA GGGGCGACGC GTTGCAGGTT CTGGCAAAAA CCGAAAAACT 

651 TTCCAAGGCG GGCGCGTTGG GCAAATCGGA AATGGAACGG TATCAAAATT 

701 GGGCATATCC GTCGCCAGCT GGCGGATGCT GCCGATGCCG CCGCTTTGAA 

751 AACCTGCCTG AAGCGGATTC CCGACAGCCT CAAAAACGGG GAATTGAGCG 

801 TATCGGTTGC GGAAAAGTAC GAACGTTTGG GACTGTATGC CGATGCGGTC 

851 AAATGGGTCA AACAGCATTA TCCGCAsAAC CGCCGCCCCG AGCTTTTGGA 

901 AGCCTTTGTC GAAAGCGTGC GCTTTTTGGG CGAGCGCGAA CAGCAGAAAG 

951 CCATCGATTT TGCCGATGCT TGGCTGAAAG AACAGCCCGA TAACGCGCTT 

1001 CTGCTGATGT ATCTCGGTCG GCTCGCCTTC GGCCGCAAAC TTTGGGGCAA 

1051 GGCAAAAGGC TACCTTGAAG CGAGCATTGC ATTAAAGCCG AGTATTTCCG 

1101 CGCGTTTGGT TCTAACAAAG GTTTTCGACG AAATCGGAGA ACCGCAGAAG 

1151 GCGGAGGCGC AC. . . 

This corresponds to the amino acid sequence <SEQ ID 750; ORFIOO: 

1 MKTWWIWL FAAAVGLALA SGIYTGDVYI VLGQTMLRIN LHAFVLGSLI 

51 AVWWYFLFK FIIGVLNIPE KMQRFGSARK GXKXXLALNK AGLAYFEGRF 

101 EKAELEASRV LVNKVGRDNR TLALMLXAHA AGQMENIXXR DRYLAEIAKL 

151 PEKQQLSRYL LLAESALNRR DYEAAEANLH AAAKMNANLT RLVRLXIRYA 

201 FDRGDALQVL AKTEKLSKAG ALGKSEMERY QNWAYRRQLA DAADAAALKT 

251 CLKRIPDSLK NGELSVSVAE KYERLGLYAD AVKWVKQHYP XNRRPELLEA 

301 FVESVRFLGE REQQKAIDFA DAWLKEQPDN ALLLMYLGRL AFGRKLWGKA 

351 KGYLEASIAL KPSISARLVL TKVFDEIGEP QKAEAH. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 75 1>: 

1 ATGAAAACGG TAGTCTGGAT TGTCGTCCTG TTTGCCGCCG CCGTCGGACT 

51 GGCGCTGGCT TCGGGCATTT ACACCGGCGA CGTGTATATC GTACTCGGAC 

101 AGACCATGCT CAGAATCAAC CTGCACGCCT TTGTGTTAGG TTCGCTGATT 

151 GCCGTCGTGG TGTGGTATTT CTTGTTTAAA TTCATTATCG GCGTACTCAA 
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201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 



TATCCCCGAA 
CCGCGCTTGC 
GAAAAGGCGG 
AGACAACCGG 
TGGAAAACAT 
CCGGAAAAAC 
GAACCGGCGC 
AGATGAATGC 
TTCGACAGGG 
CAAGGCGGGC 
CATACCGCCG 
TGCCTGAAGC 
GGTTGCGGAA 
GGGTCAAACA 
TTTGTCGAAA 
CGATTTTGCC 
TGATGTATCT 
AAAGGCTACC 
TTTGGTTCTA 
AGGCGCAGCG 
GCAGCGTTAG 



AAGATGCAGC 
CTTGAACAAG 
AACTAGAAGC 
ACTTTGGCAT 
CGAGCTGCGC 
AGCAGCTTTC 
GATTACGAAG 
CAACCTTACG 
GCGACGCGTT 
GCGTTGGGCA 
CCAGCTGGCG 
GGATTCCCGA 
AAGTACGAAC 
GCATTATCCG 
GCGTGCGCTT 
GATGCTTGGC 
CGGTCGGCTC 
TTGAAGCGAG 
GCAAAGGTTT 
CAACTTGGTT 
AGCAGCATAG 



GTTTCGGTTC 
GCGGGTTTGG 
CTCACGCGTG 
TGATGCTGGG 
GACCGTTATC 
CCGTTATCTT 
CGGCGGAAGC 
CGCCTCGTGC 
GCAGGTTCTG 
AATCGGAAAT 
GATGCTGCCG 
CAGCCTCAAA 
GTTTGGGACT 
CACAACCGCC 
TTTGGGCGAG 
TGAAAGAACA 
GCCTACGGCC 
CATTGCATTA 
TCGACGAAAT 
TTGGAAGCCG 
CTGA 



GGCGCGTAAA 
CGTATTTTGA 
TTGGTCAACA 
CGCGCACGCG 
TTGCGGAAAT 
TTGTTGGCGG 
CAATCTTCAT 
GTCTGCAACT 
GCAAAAACCG 
GGAACGGTAT 
ATGCCGCCGC 
AACGGGGAAT 
GTATGCCGAT 
GCCCCGAGCT 
CGCGAACAGC 
GCCCGATAAC 
GCAAACTTTG 
AAGCCGAGTA 
CGGAGAACCG 
TCTCCGATGA 



GGCCGCAAGG 
AGGGCGTTTT 
AAGAGGCCGG 
GCCGGACAGA 
CGCCAAACTG 
AATCGGCGTT 
GCGGCGGCGA 
TCGTTACGCT 
AAAAACTTTC 
CAAAATTGGG 
TTTGAAAACC 
TGAGCGTATC 
GCGGTCAAAT 
TTTGGAAGCC 
AGAAAGCCAT 
GCGCTTCTGC 
GGGCAAGGCA 
TTTCCGCGCG 
CAGAAGGCGG 
CGAACGTCAC 



This corresponds to the amino acid sequence <SEQ ID 752; ORF100-1>: 



1 MKTWWIWL FAAAVGLALA SGIYTGDVYI VLGQTMLRIN LHAFVLGSLI 

51 AVWWYFLFK FIIGV LNIPE KMQRFGSARK GRKAALALNK AGLAYFEGRF 

101 EKAELEASRV LVNKEAGDNR TLALMLGAHA AGQMENIELR DRYLAEIAKL 

151 PEKQQLSRYL LLAESALNRR DYEAAEANLH AAAKMNANLT RLVRLQLRYA 

201 FDRGDALQVL AKTEKLSKAG ALGKSEMERY QNWAYRRQLA DAADAAALKT 

251 CLKRIPDSLK NGELSVSVAE KYERLGLYAD AVKWVKQHYP HNRRPELLEA 

301 FVESVRFLGE REQQKAIDFA DAWLKEQPDN ALLLMYLGRL AYGRKLWGKA 

351 KGYLEASIAL KPSISARLVL AICVFDEIGEP QKAEAQRNLV LEAVSDDERH 

401 AALEQHS* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N.meningitidis (strain A) 

ORF100 shows 93.5% identity over a 386aa overlap with an ORF (ORFlOOa) from strain A of N. 
meningitidis: 



10 20 30 40 50 60 

orf 100 . pep MKT WW I WL FAAAVGLALAS G I YTG D VY I VLGQTMLR I N LHAFVLG S L I A VWW Y FL FK 
M 1 I I I I I I I I I I I I I I I I I | I | | | | I | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 
orf 100a MKTVWIWLFAAAXGLALASGIXTGDVYIVLGQTMLRINLHAFVLGSLIAVVVWYFLFK 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 100 . pep FI IGVLN I PEKMQRFGSARKGXKXXLALNKAGLAYFEGRFEKAELEASRVLVNKVGRDNR 
I I I I I I I I I I I I II I I I I I I I I I I I I I II I I I I || I I | | | | | | | || | || : Ml 
orf 100a FIIGVLNXPEKMQRFGSARKGRKA71LALNKAGLAYFEGRFEKAELEASRVLGNKEAGDNR 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 100 . pep TLAI^LXAHAAGQMENIXXRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 
MMM IMMIIIII I I I I I I I I I I I I I I 1 I I I II I I I I I I II I I I I I I i I I I I I I 
orf 100a TLALMLGAHAAGQMENIELRDRYLAEIAKLPEKQQLSRYLLIiAESALNRRDYEAAEANLH 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 100 . pep AAAKMNANLTRLVRLX I R YA FDRG DALQVLAKTEKL S KAGALGKS EME R YQNWAYRRQ LA 

I I I I I I I I I I I I I I I : I I II I I I I I I I I I I II I I II I I I I I I I I I I I I II I I I I I I 
orf 100a AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKXSKAGAXGKSEMERYQNWAYRRQLX 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 100 . pep DAADA/^ALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPXNRRPELLEA 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I | | 1 1 | I | | | | 
orf 100a DAADAAALKTCLKRI PD S LKNGELSVS VAEKYERLGL YADAVKWVKQHY PHNRR PELLEA 

250 260 270 280 290 300 
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310 320 330 340 350 360 

orf 100 . pep EVESVRFLGEREQQKAI DFADAWLKEQPDNALLLMYLGRLAFGRKLWGKAKGYLEAS IAL 
I I | I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 : II I I I I I I I I I I I I M I I 
or f 1 0 0a EVE SVRFLGERDQQKAI DFADAWLKEQPDNALLLXYLGRLAYGRKLWGKAKGYLEAS IAL 

310 320 330 340 350 360 

370 380 
orf 100 .pep KPS ISARLVLTKVFDEIGEPQKAEAH 
llll! I 1111:11 III Mill III: 
orf 100a KPSISARLVLAKVFDETGEPQKAEAQRNLVLASVAEENRPSAETHX 

370 380 390 400 

The complete length ORFlOOa nucleotide sequence <SEQ ID 753> is: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 



ATGAAAACGG 
GGCATTGGCG 
AGACCATGCT 
GCCGTCGTGG 
TANCCCCGAA 
CCGCGCTTGC 
GAAAAGGCGG 
GGATAACCGG 
TGGAAAACAT 
CCGGAAAAGC 
GAACCGGCGC 
AGATGAATGC 
TTCGACAGGG 
CAAGGCGGGC 
CATACCGCCG 
TGCCTGAAGC 
GGTTGCGGAA 
GGGTCAAACA 
TTTGTCGAAA 
CGATTTTGCC 
TGANGTATCT 
AAAGGCTACC 
TTTGGTTCTG 
AGGCGCAGCG 
TCCGCCGAAA 



TAGTCTGGAT 
TCGGGCATTN 
CAGAATCAAC 
TGTGGTATTT 
AAGATGCAGC 
TTTGAACAAG 
AACTTGAAGC 
ACTTTGGCAT 
CGAGCTGCGC 
AGCAGCTTTC 
GATTACGAAG 
CAACCTTACG 
GCGACGCGTT 
GCGTNGGGCA 
CCAGCTGNCG 
GGATTCCCGA 
AAGTACGAAC 
GCATTATCCG 
GCGTGCGCTT 
GATGCTTGGC 
CGGTCGGCTC 
TTGAAGCGAG 
GCAAAGGTTT 
CAACTTGGTT 
CCCATTGA 



TGTCGTCCTG 
ACACCGGCGA 
CTGCACGCCT 
CCTGTTCAAA 
GTTTCGGTTC 
GCGGGTTTGG 
CTCGCGCGTA 
TGATGTTGGG 
GACCGTTATC 
CCGTTATCTT 
CGGCGGAAGC 
CGCCTCGTGC 
GCAGGTTCTG 
AATCGGAAAT 
GATGCTGCCG 
CAGCCTCAAA 
GTTTGGGACT 
CACAACCGCC 
TTTGGGCGAA 
TGAAAGAACA 
GCCTACGGCC 
CATTGCATTA 
TTGACGAAAC 
TTGGCAAGCG 



TTTGCCGCCG 
CGTGTATATC 
TTGTGTTAGG 
TTCATCATCG 
GGCGCGTAAA 
CGTATTTTGA 
TTGGGAAACA 
CGCACATGCC 
TTGCGGAAAT 
TTGTTGGCGG 
CAATCTTCAT 
GTCTGCAACT 
GCAAAAACCG 
GGAACGGTAT 
ATGCCGCCGC 
AACGGGGAAT 
GTATGCCGAT 
GACCCGAACT 
CGCGATCAGC 
GCCCGATAAT 
GCAAACTTTG 
AAGCCGAGTA 
CGGAGAACCG 
TTGCCGAGGA 



CNNTCGGGCT 
GTACTCGG&C 
TTCGCTGATT 
GCGTACTCAA 
GGCCGCAAGG 
AGGGCGTTTT 
AAGAGGCGGG 
GCCGGGCAGA 
CGCCAAACTG 
AATCGGCGTT 
GCGGCGGCGA 
TCGTTACGCT 
AAAAANTTTC 
CAAAATTGGG 
TTTGAAAACC 
TGAGCGTATC 
GCGGTCAAAT 
TTTGGAAGCN 
AGAAAGCCAT 
GCGCTTCTGC 
GGGCAAGGCA 
TTTCCGCGCG 
CAGAAGGCGG 
AAACCGNCCT 



This encodes a protein having amino acid sequence <SEQ ID 754>: 



1 MKTWWIWL FAAAXGLALA SGIXTGDVYI VLGQTMLRIN LHAFVLGSLT 

51 AWVWYFLFK FIIGV LNXPE KMQRFGSARK GRKAALALNK AGLAYFEGRF 

101 EKAELEASRV LGNKEAGDNR TLALMLGAHA AGQMENIELR DRYLAEIAKL 

151 PEKQQLSRYL LLAESALNRR DYEAAEANLH AAAKMNANLT RLVRLQLRYA 

201 FDRGDALQVL AKTEKXSKAG AXGKSEMERY QNWAYRRQLX DAADAAALKT 

251 CLKRIPDSLK NGELSVSVAE KYERLGLYAD AVKWVKQHYP HNRRPELLEA 

301 FVESVRFLGE RDQQKAIDFA DAWLKEQPDN ALLLXYLGRL AYGRKLWGKA 

351 KGYLEASIAL KPSISARLVL AKVFDETGEP QKAEAQRNLV LASVAEENRP 

401 SAETH* 

ORFlOOa and ORF100-1 show 95.1% identity in 406 aa overlap: 



10 20 30 40 50 60 

orf 100a . pep MKTVVWIV^FAAAXGLAIASGIXTGDVYIVLGQTMLRINLHAFVLGSLIA 

I I I I I I I I I I i I I I 1 I I I II II I I II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 100-1 MKT WWI WLFAAAVG LALASG I YTGDVY I VLGQTMLRINLHAFVLGS L I AWVWYFLFK 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 100a . pep FIIGVLNXPEKMQRFGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLGNKEAGDNR 
lllllll II Mill Ml III Mill I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 1 I I 
orf 100-1 FIIGVLNIPEBCMQRFGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLVNKEAGDNR 

70 80 90 100 110 120 



130 140 150 160 170 180 

or f 100a . pep TLALMLGAHAAGQMENIELRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 
IMIllllllllllllllllllllllllllllllllllllllMllllllllMIIIIII 
orf 100-1 TLALMLGAHAAGQMENIELRDRYLAEIAKLPEKMLSRYLLLAESALNRRDYEAAEANLH 
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130 



140 



150 



160 



170 



180 



10 



15 



20 



25 



190 200 210 220 230 240 

or f 1 00a . pep AAAKMNANLTRLVRLQLRYAFDRGDALQVIAKTEKXSKAGAXGKSEMERYQNWAYRRQLX 

tlllll Mill II I IN I II I II I! Ml I Mill I Mill II III Mill ill MM 
orfl00-l AAAKMNANLTRLVRLQLRY AFDRG DALQVLAKTEKLSKAGALGKS EMERYQNWAYRRQLA 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 100a . pep DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 
I I I I HI I I I II II I M I I II II I I I I I I II II II I II I I II M II I I II I II II II II I 
orf 100-1 DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 100a . pep FVESVRFLGERDQQKAIDFADAWLKEQPDNALLLXYLGRLAYGRKLWGKAKGYLEASIAL 
I I I I I M I M I : I II I II M M I I M I M M I II M II II I II I M M II I I II II M I 
orf 100-1 FVESVRFLGEREQQKAIDFADAWLKEQPDNALLLMYLGRLAYGRKLWGKAKGYLEASIAL 

310 320 330 340 350 360 

370 380 390 400 

orf 100a . pep KPS I SARLVLAKVFDETGE PQKAEAQRNLVLASVAEENRPSA-ETHX 

I 1 I I 1 I I I I I I I 1 I I I Ml Mill Ml Ml :|::::| :| | | 
orfl00-l KPS I S ARLVLAKVFDE IGE PQKAEAQRNLVLEAVSDDERHAALEQHSX 

370 380 390 400 



Homology with a predicted ORF from A Gonorrhoeae 

ORF100 shows 93.3% identity over a 386 aa overlap with a predicted ORF (ORFlOOng) from 
N.gonorrhoeae: 



30 



35 



40 



45 



50 



55 



orf 100. pep 
orflOOng 
orf 100. pep 
orflOOng 
orf 100. pep 
orflOOng 
orf 100. pep 
orflOOng 
orf 100. pep 
orflOOng 
orf 100. pep 
orflOOng 
orf 100. pep 
orflOOng 



MKTVWIWLFAAAVGIAIASGIYTGDVYIVLGQTMLRINIJiAEVLGSLIAVVVWYFLFK 60 
II Ml till llllllllll III IMIMM I Mill Mi I MUM III MM lllil I I 

MKTVWIWLFAAAVGLALASGIYTGDVYIVLGQTMLRINLHAF\^GSLIAVVVWYBXFK 60 

FIIGVLNIPEKMQRFGSARKGXKXXLALNKAGLAYFEGRFEKAELEASRVLVNKVGRDNR 120 

1111111111:1:1 MUM I I I II M I M I II M I II M II I II II II : ill 

FI IGVLN I PENMRRS GSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLGNKEAGDNR 120 

TLALMLXAHAAGQMEN IXXRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 180 
I I I I I I llllllllll II I I I I I I I I I M II I II II I II I I I I I M M I I M I I I M 

TLALMLGAHAAGQMENIELRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 180 

AAAKMNANLTRLVRMIRYAFDRGDALQVU^TEKLSKAGALGKSEMERYQNWAYRRQLA 240 
I I I II I II I I I I II I Ml II MM Ml MMI III II M I! Ml I M I M I MIM Ml 

AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQNWAYRRQMA 240 



DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPXNRRPELLEA 
I I I I I I I I I I I II I I II I I II I II I I II I II II II II II II I I I II II II I I I II II II 
DAADAAALKTCLKRI PDS LKNGE LSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 



300 



300 



FVESVRFLGEREQQKAIDFADAWLKEQPDNALLLMYLGRLAFGRKLWGKAKGYLEASIAL 360 
I M I M I I M II I I I I I II II Ml II I II II II II I I I I M: I II I I I I II II I II II II 
FVESVRFLGEREQQKAIDFADSWLKEQPDNALLLMYLGRLAYGRKLWGKAKGYLEASIAL 360 



KPS I SARLVLTKVFDE IGE PQKAEAH 386 
MM 11111:11111 :: Mill: 

KPSIPARLVLAKVFDETAQSQKAEAQRNLVLASVAGENRPSAETR 405 

The complete length ORFlOOng nucleotide sequence <SEQ ID 755> is: 



60 



65 



i 

51 
101 
151 
201 
251 
301 
351 
401 



ATGAAAACGG 
GGCGCTGGCT 
AGACCATGCT 
GCCGTCGTGG 
TATCCCCGAA 
CCGCGCTTGC 
GAAAAGGCGG 
AGACAACCGG 
TGGAAAATAT 



TAGTCTGGAT 
TCGGGCATTT 
CAGAATCAAC 
TGTGGTATTT 
AATATGCGGC 
CTTGAATAAG 
AACTCGAAGC 
ACTTTGGCAT 
CGAGCTGCGC 



TGTTGTCCTG 
ACACCGGCGA 
CTGCACGCCT 
CCTGTTTAAA 
GTTCCGGTTC 
GCGGGTTTGG 
CTCTCGAGTG 
TGATGCTGGG 
GACCGTTATC 



TTTGCCGCCG 
CGTGTATATC 
TTGTGTTAGG 
TTCATCATCG 
GGCGCGGAAA 
CGTATTTCGA 
TTGGGCAACA 
CGCGCACGCG 
TTGCGGAAAT 



CCGTCGGACT 
GTACTCGGAC 
TTCGCTGATT 
GCGTACTCAA 
GGCCGCAAGG 
AGGGCGTTTT 
AAGAGGCCGG 
GCAGGACAGA 
CGCCAAACTG 
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451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 



CCGGAAAAAC 
AAACCGGCGC 
AGATGAATGC 
TTCGATCGGG 
CAAGGCGGGC 
CATACCGCCG 
TGCCTGAAGC 
GGTTGCGGAA 
GGGTCAAACA 
TTTGTCGAAA 
CGATTTTGCC 
TGATGTATCT 
AAAGGCTACC 
TTTGGTGTTG 
AAGCACAGCG 
TCCGCCGAAA 



AGCAGCTTTC 
GATTACGAAG 
CAACCTTACG 
GCGATGCGTT 
GCGTTGGGCA 
CCAGATGGCG 
GGATTCCCGA 
AAGTACGAAC 
GCATTATCCG 
GCGTGCGCTT 
GATTCTTGGC 
CGGCCGGCTC 
TTGAAGCGAG 
GCAAAGGTTT 
CAACTTGGTT 
CCCGTTGA 



CCGCTATCTT 
CGGCGGAAGC 
CGCCTCGTGC 
GCAGGTTCTG 
AATCGGAAAT 
GATGCTGCCG 
CAGCCTCAAA 
GTTTGGGACT 
CACAACCGCC 
TTTGGGCGAG 
TGAAAGAACA 
GCCTACGGCC 
TATTGCACTG 
TTGACGAAAC 
TTGGCAAGCG 



CTGCTGGCGG 
CAATCTTCAT 
GTCTGCAACT 
GCAAAAaccG 
GGAACGGTAT 
ATGCCGCCGC 
AACGGGGAAT 
GTATGCCGAT 
GCCCCGAGCT 
CGCGAACAGC 
GCCCGATAAC 
GCAAACTTTG 
AAGCCGAGTA 
CGCACAGTCG 
TTGCCGGGGA 



AATCGGCGTT 
GCGGCGGCGA 
TCGTTACGCC 
AAAAACTTTC 
CAAAATTGGG 
TTTGAAAACC 
TGagcGTATC 
GCGGTCAAAT 
TTTGGAAGCC 
AGAAAGCCAT 
GCGCTTCTGC 
GGGTAAGGCA 
TTCCGGCGCG 
CAAAAAGCCG 
AAACCGCCCT 



This encodes a protein having amino acid sequence <SEQ ID 756>: 



i 

51 
101 
151 
201 
251 
301 
351 
401 



MKTWWIWL FAAAVGLALA 
AVWWYFLFK FIIGVLNIPE 



EKAELEASRV 
PEKQQLSRYL 
FDRGDALQVL 
CLKRIPDSLK 
FVESVRFLGE 
KGYLEASIAL 
SAETR* 



LGNKEAGDNR 
LLAESALNRR 
AKTEKLSKAG 
NGELSVSVAE 
REQQKAIDFA 
KPSIPARLVL 



SGIYTGDVYI 
NMRRSGSARK 
TLALMLGAHA 
DYEAAEANLH 
ALGKSEMERY 
KYERLGLYAD 
DSWLKEQPDN 
AKVFDETAQS 



VLGQTMLRIN 
GRKAALAUJK 
AGQMENIELR 
AAAKMNANLT 
QNWAYRRQMA 
AVKWVKQHYP 
ALLLMYLGRL 
QKAEAQRNLV 



LHAFVLGSLI 
AGLAYFEGRF 
DRYLAEIAKL 
RLVRLQLRYA 
DAADAAALKT 
HNRRPELLEA 
AYGRKLWGKA 
LASVAGENRP 



ORFlOOng and ORF100-1 show 95.3% identity in 402 aa overlap: 



orf 100-1. pep 
orflOOng 

orf 100-1. pep 
orflOOng 



orf 100-1. pep 
orflOOng 

orf 100-1. pep 
orflOOng 

orf 100-1. pep 
orflOOng 

orf 100-1. pep 
orflOOng 

orf 100-1. pep 
orflOOn 



10 20 30 40 50 60 

MKTNA^IWLFAAAVGLALASGIYTGDVYIVLGQTMLRINLHAFVLGSLIAVVVWYFLFK 

1 I I t I 1 I I I 1 I I 1 I I I I I I 1 ! 1 I I I I 1 1 f t I 1 t I 1 t I t I I I 1 t i I I I 1 I I 1 I I I I I I 1 I I 
MKTVVWIWLFAAAVGLALASGIYTGDVYIVLGQTMLRINLHAFVLGSLIAVVVWYFLFK 

10 20 30 40 50 60 

70 80 90 100 110 120 

FI IGVLN I PEKMQRFGS ARKGRKAALALNKAGLAYFEGRFEKAELEASRVLVNKEAGDNR 

I I 1 I I I M I ! : I : , I | | I | i I I I I I I 1 I II I I I I I I I I I I I I I I I I I II I MINIM 
FI IGVLNI PENMRRSGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLGNKEAGDNR 
70 80 90 100 110 120 

130 140 150 160 170 180 

TLALMLGAHAAGQMENIELRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 

MM MM M MMM I It I 1 I M MINIM M MMIMIMI MUM IMMMM 
TLALMLGAHAAGQMENIELRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 

130 140 150 160 170 180 

190 200 210 220 230 240 

AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQNWAYRRQLA 
I I I I I M I II I I I I I I I I M M I M II I I M I If I M I I I M I I I II f M I I 1 I I II I : I 
AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQNWAYRRQMA 

190 200 210 220 230 240 

250 260 270 280 290 300 

DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 

MM MMM II INI IN INN NUN INI I MMIIIIMM L 1 1 1 1 1 1 1 1 1 1 1 1 
DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 
250 260 270 280 290 300 

310 320 330 340 350 360 

FVESVRFLGEREQQKAIDFADAWLKEQPDNALLLMYLGRLAYGRKLWGKAKGYLEASIAL 
N II II N II N II II I N N : II II 11 N II I N N I N N II II I N N I N I N I I I 
FVE S VRFLGEREQQKAI DFADSWLKEQP DNALLLMYLGRLAYGRKLWGKAKG YLEAS I AL 

310 320 330 340 350 360 

370 380 390 400 

KPSISARLVLAKVFDEIGEPQKAEAQRNLVLEAVSDDERHAALEQHSX 

INI IMMMM M :: M MMMMI :|: ::l - I 
KPSI PARLVLAKVFDETAQSQKAEAQRNLVLASVAGENRPSAETRX 
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370 380 390 400 

Based on this analysis, including the presence of a putative leader sequence, a putative 
transmembrane domain, and a RGD motif, it is predicted that the proteins from Kmeningitidis and 
K gonorrhoeae > 9 and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

Example 90 

The following DNA sequence, believed to be complete, was identified in Kmeningitidis <SEQ ID 
757> 

1 ATGATGTTTT CTTGGTTCAA GCTGTTTCAC TTGTTTTTTG TCATTTCGTG 

51 GTTTGCAGGG CTGTTTTACC TGCCGAGGAT TTTCGTCAAT ATGGCGATGA 

101 TTGATGTGCC GCGCGGCAAT CCCGAGTATG TGCGTCTGTC GGGCATGGCG 

151 GTGCGGCTGT ACCGTTTTAT GTCGCCGTTG GGCTTCGGCG CGGTCGTGTT 

201 CGGCGCGGCG ATACCGTTTG CCGCCGGCTG GTGGGGCAGC GGCTGGGTAC 

251 ACGTCAAACT GTGTTTGGGC TTGATGCTCT TGGCTTACCA GTTGTATTGC 

301 GGCGTGCTGC TGCGCCGTTT TCAGGATTAC AGCAATGCTT TTTCACACCG 

351 CTGGTACCGC GTGTTCAACG AAATCCCCGT GCTGCTGATG GTTGCCGCGC 

401 TGTATSTGGT CGTGTTCAAA CCGTTTTGA 

This corresponds to the amino acid sequence <SEQ ID 758; ORF102>: 

1 MMFSWFKLFH LFFVISWFAG LFYLPRIFVN MAMIDVPRGN PEYVRLSGMA 
51 VRLYRFMSPL GFGAWFGAA IPFAAGWWGS GWVHVKLCLG LMLLAYQLYC 
101 GVLLRRFQDY SNAFSHRWYR VFNEIPVLLM VAALYXWFK PF* 

Further work revealed the complete nucleotide sequence <SEQ ID 759>: 

1 ATGATGTTTT CTTGGTTCAA GCTGTTTCAC TTGTTTTTTG TCATTTCGTG 

51 GTTTGCAGGG CTGTTTTACC TGCCGAGGAT TTTCGTCAAT ATGGCGATGA 

101 TTGATGTGCC GCGCGGCAAT CCCGAGTATG TGCGTCTGTC GGGCATGGCG 

151 GTGCGGCTGT ACCGTTTTAT GTCGCCGTTG GGCTTCGGCG CGGTCGTGTT 

201 CGGCGCGGCG ATACCGTTTG CCGCCGGCTG GTGGGGCAGC GGCTGGGTAC 

251 ACGTCAAACT GTGTTTGGGC TTGATGCTCT TGGCTTACCA GTTGTATTGC 

301 GGCGTGCTGC TGCGCCGTTT TCAGGATTAC AGCAATGCTT TTTCACACCG 

351 CTGGTACCGC GTGTTCAACG AAATCCCCGT GCTGCTGATG GTTGCCGCGC 

401 TGTATCTGGT CGTGTTCAAA CCGTTTTGA 

This corresponds to the amino acid sequence <SEQ ID 760; ORF102-1>: 

1 MMFSWFKLFH LFFVISWFAG LFYLPRIFVN MAM IDVPRGN PEYVRLSGMA 
51 VRLYRFMSP L GFGAWFGAA IPFAAG WWGS GWVHV KLCLG LMLLAYQLYC 
101 GVLLRRFQDY SNAFSHRWYR VFNE IPVLLM VAALYLWFK P F* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with HP1484 hypothetical integral membrane protein of H. pylori (accession number AE0006471 
ORF102 and HP1484 show 33% aa identity in 143aa overlap: 

orfl02 3 FSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPLGF 62 

F W K FH+ VI SW A LFYLPR+FV A + V++ +LY F++ 

HP1484 8 FLWVKAFHVIAVISWMAALFYLPRLFVYHAENAHKKEFVGWQIQEK — KLYSFIASPAM 65 

orfl02 63 GAWFGAAIPFAAG WWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWY 119 

G + + + GW+H KL L ++LLAY YC +R + + R+Y 

HP1484 66 GFTLITGILMLLIEPTLFKSGGWLHAKlJ^VVLLLAYHFYCKKCMREIjEKDPTRRNARFY 125 

orfl02 120 RVFNEIPXXXXXXXXXXXXFKPF 142 

RVFNE P KPF 
HP1484 126 RVFNEAPTILMILIVILWVKPF 148 
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Homology with a predicted ORF from N.meninzitidis (strain A) 

ORF102 shows 99.3% identity over a 142aa overlap with an ORF (ORF102a) from strain A of K 



meningitidis: 



orfl02.pep 
orfl02a 



orfl02.pep 
orfl02a 



10 20 30 40 50 60 

MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPL 

II 1 I 1 I t I I 1 I I i 1 I I 1 I 1 M t I I I I I I M I ! I I 1 ! 1 I 1 I I I I I I K I I 1 I t I 1 1 

MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPL 

10 20 30 40 50 60 

70 80 90 100 110 120 

GFGAWFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 

INI II MINIMI Hi I Mi II II Ml II Ml IN MMI I Ml MINI MINIM 
GFGAWFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 
70 80 90 100 110 120 



130 140 
orf 102 . pep VFNE I PVLLMVAALYXWFKPFX 
I II II N II N II I I I II II II 
orf 102a VFNEIPVLLMVAALYLWFKPFX 
130 140 

The complete length ORF102a nucleotide sequence <SEQ ID 761> is: 

1 ATGATGTTTT CTTGGTTCAA GCTGTTTCAC TTGTTTTTTG TCATTTCGTG 

51 GTTTGCAGGG CTGTTTTACC TGCCGAGGAT TTTCGTCAAT ATGGCGATGA 

25 101 TTGATGTGCC GCGCGGCAAT CCCGAGTATG TGCGTCTGTC GGGCATGGCG 

151 GTGCGGCTGT ACCGTTTTAT GTCGCCGTTG GGCTTCGGCG CGGTCGTGTT 

201 CGGCGCGGCG ATACCGTTTG CCGCCGGCTG GTGGGGCAGC GGCTGGGTAC 

251 ACGTCAAACT GTGTTTGGGC TTGATGCTCT TGGCTTACCA GTTGTATTGC 

301 GGCGTGCTGC TGCGCCGTTT TCAGGATTAC AGCAATGCTT TTTCACACCG 

30 351 CTGGTACCGC GTGTTCAACG AAATCCCCGT GCTGCTGATG GTTGCCGCGC 

401 TGTATCTGGT CGTGTTCAAA CCGTTTTGA 

This encodes a protein having amino acid sequence <SEQ ID 762>: 



1 MMFSWFKLFH LFFVI SWFAG LFYLPRIFVN MAM IDVPRGN PEYVRLSGMA 
51 VRLYRFMSP L GFGAWFGAA IPFAAGW WGS GWVHV KLCLG LMLLAYQLYC 
35 101 GVLLRRFQDY SNAFSHRWYR VFNE IPVLLM VAALYLWFK P F* 

ORF102a and ORF102-1 show complete identity in 142 aa overlap: 



40 



45 



50 



10 20 30 40 50 60 

orf 102a . pep MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPL 
4 * I N II N I II II N II I I N II N I N II N II I II N N N N I II I II II N II II II 
orf 102-1 MMFS WFKLFHLFFV I S W FAGL FYL PR I FVNMAMI DVPRGN PE YVRLSGMAVRLYRFMS PL 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 102a . pep GFGAWFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 
L & || 1 || N II II I II II II I II II II I II M II II N II II N II I II I N N II II II II 
or f 1 0 2 - 1 GFGAWFGAAI PFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 

70 80 90 100 110 120 

130 140 
orf 102a. pep VFNEIPVLLMVAALYLWFKPFX 
II N II I I It II I i II N I N N 
orf 102-1 VFNEIPVLLMVAALYLWFKPFX 

130 140 



55 Homology with a predicted ORF from N. gonorrhoeae 

ORF102 shows 97.9% identity over a 142 aa overlap with a predicted ORF (ORF102ng) from N. 



gonorrhoeae: 
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10 



orf 102. pep 
orfl02ng 
orf 102. pep 
orfl02ng 
orf 102. pep 
orfl02ng 



MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPL 
I M II I I I I I I I I I I I I I I | I | | | | | | | | | | | | || : | | | | | | | | | | | | | M | | | | | | | | | 
MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDAPRGNPEYVRLSGMAVRLYRFMSPL 



60 



60 



120 



GFGAWFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 
Ml Ml III IMI I II I I I I I 1 I I I I I I I I J I I I I I I I I I I I I I J I I I 1 I | I f 1 | | f | | 
G FGAWFGAAI PFAAGRWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 120 

VFNEI PVLLMVAALYXWFKPF 142 
I I I M I I I I I I I I I I I I I I I I 
VFNE I PVLLMVAALYLWFKPF 142 



The complete length ORF102ng nucleotide sequence <SEQ ID 763> is: 



15 



20 



l 

51 
101 
151 
201 
251 
301 
351 
401 



ATGATGTTTT 
GTTTGCAGGG 
TTGATGCGCC 
GTGCGGTTGT 
CGGCGCGGCG 
ACGTCAAACT 
GGCGTGCTGC 
CTGGTACCGC 
TGTATCTGGT 



CTTGGTTCAA 
CTGTTTTACC 
GCGCGGCAAT 
ACCGTTTTAT 
ATACCGTTTG 
GTGTTTGGGC 
TGCGCCGTTT 
GTGTTCAAcg 
CGTGTTCAAA 



GCTGTTTCAC 
TGCCGAGGAT 
CCCGAGTATG 
GTCGCCTTTG 
CCGCcggccg 
TTGATGCTCT 
TCAGGATTAC 
aAATCCCCGT 
CCGTTTTGA 



TTGTTTTTTG 
TTTCGTCAAT 
TGCGCCTGTC 
GGTTTCGGCG 
GTGGGGCagc 
TGGCTTATCA 
AGCAATGCTT 
GCTGCTGATG 



TCATTTCGTG 
ATGGCGATGA 
GGGGATGGCG 
CGGTCGTGTT 
ggctggGTTC 
GTTGTATTGC 
TTTCACACCG 
GTTGCCGCGC 



25 



30 



35 



40 



45 



50 



55 



60 



This encodes a protein having amino acid sequence <SEQ ID 764>: 

1 MMFSWFKLFH LFFVISWFAG LFYLPRIFVN MAM IDAPRGN PEYVRLSGMA 
51 VRLYRFMSP L GFGAWFGAA IPFAAG RWGS GWVHVK LCLG LMLLAYQLYC 
101 GVLLRRFQDY SNAFSHRWYR VFNE IPVLLM VAALYLWFK P F* 

ORF102ng and ORF102-1 show 98.6% identity in 142 aa overlap: 

10 20 30 40 50 60 

orf 102-1. pep MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPL 
t>N IIMIII III Ml IIMIIIIIIII lllil!:MMII IMI I li IN I || MM | 
orfl02ng MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDAPRGNPEYVRLSGMAVRLYRFMSPL 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 102-1 . pep GFGAWFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 
I M I I 1 1 I 1 1 I I 1 1 1 1 I It Ml 111 II II III MM I II II I I II III MM I I INI I 
orfl02ng G FGAWFGAAI PFAAGRWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 

70 80 90 100 110 120 

130 140 
orf 102-1 - pep VFNEIPVLLMVAALYLWFKPFX 
M II M II II I II I M I II I II I 
orfl02ng VFNEIPVLLMVAALYLWFKPFX 

130 140 

In addition, ORF102ng shows significant homology to a membrane protein from H.pylori: 

gi 1 2314656 (AE000647) conserved hypothetical integral membrane protein 
[Helicobacter pylori] Length » 148 
Score =79.2 bits (192), Expect = le-14 

Identities - 50/147 (34%), Positives « 68/147 (46%), Gaps - 13/147 (8%) 

FS W FKLFH LFFV I S W FAGL FYLPRI FVNMAM I DAPRGN PE YVRLSGMAVRL YR FMS PLG F 62 
F W K FH+ VISW A LFYLPR+FV A + V++ +LY F++ 

FLWVKAFHVIAVISWMAALFYLPRLFVYHAENAHKKEFVGVVQIQEK — KLYSFIASPAM 65 



Query: 


3 


Sbj ct : 


8 


Query: 


63 


Sbjct: 


66 


Query: 


116 


Sbjct: 


122 



115 



GAWFGAAI P FAAGRWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQD YSNAFS 

G + + F +G GW+H KL L ++LLAY YC +R + + 
GFTLITGILMLLIEPTLFKSG GWLHAKLALWLLLAYHFYCKKCMRELEKDPTRRN 121 

HRWYRVFNEIPXXXXXXXXXXXXFKPF 142 
R+YRVFNE P KPF 
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Based on this analysis, it is predicted that these proteins from Kmeningitidis and K gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 91 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 765>: 



1 ATGGCAAAAA TGATGAAATG GGCGGCTGTT GCGGCGGTCG CGGCGGCAGC 

51 GGTTTGGGGC GGATGGTCTT AACTGAAGCC CGAGCCGCAC GTGCTTGATA 

101 TTACGGAAAC GGTCAGGCGC GGC // 

//. . ATTTCGTTTA CGATTTTGTC CGAACCGGAT ACGCCGATTA AGGCGAAGCT 

51 CGACAGCGTC GACCCCGGGC TGACCACGAT GTCGTCGGGC GGTTACAACA 

101 GCAGTACGGA TACGGCTTCC AATGCGGTCT ACTATTATGC CCGTTCGTTT 

151 GTGCCGAATC CGGACGGCAA ACTCGCCACG GGGATGACGA CGCAGAATAC 

201 GGTTGAAATC GACGGCGTGA AAAATGTGCT GATTATTCCG TCGCTGACCG 

251 TGAAAAATCG CGGCGGCAAG GCGTTTGTGC GCGTGTTGGG TGCGGACGGC 

301 AAGGCGGCGG AACGCGAAAT CCGGACCGGT ATGAGAGACA GTATGAATAC 

351 CGAAGTAAAA AGCGGGTTGA AAGAGGGGGA CAAAGTGGTC ATCTCCGAAA 

401 TAACCGCCGC CGAGCAACAG GAAAGCGGCG AACGCGCCCT AGGCGGCCCG 

451 CCGCGCCGAT AA 



This corresponds to the amino acid sequence <SEQ ID 766; ORF85>: 



251 PIKAKLDSVD PGLTTMSSGG YNSSTDTASN AVYYYARSFV PNPDGKLATG 
301 MTTQNTVEID GVKNVLIIPS LTVKNRGGKA FVRVLGADGK AAEREIRTGM 
351 RDSMNTEVKS GLKEGDKWI SEITAAEQQE SGERALGGPP RR* 



Further work revealed the further partial nucleotide sequence <SEQ ID 767>: 



1 ..GTATCGGTCG GCGCGCAGGC ATCGGGGCAG ATTAAGATAC TTTATGTCAA 

51 ACTCGGGCAA CAGGTTAAAA AGGGCGATTT GATTGCGGAA ATCAATTCGA 

101 CCTCGCAGAC CAATACGCTC AATACGGAAA AATCCAAGTT GGAAACGTAT 

151 CAGGCGAAGC TGGTGTCGGC ACAGATTGCA TTGGGCAGCG CGGAGAAGAA 

201 ATATAAGCGT CAGGCGGCGT TATGGAAGGA AAACGCGACT TCCAAAGAGG 

251 ATTTGGAAAG CGCGCAGGAT GCGTTTGCCG CCGCCAAAGC CAATGTTGCC 

301 GAGCTGAAGG CTTTAATCAG ACAGAGCAAA ATTTCCATCA ATACCGCCGA 

351 GTCGGAATTG GGCTACACGC GCATTACCGC AACGATGGAC GGCACGGTGG 

401 TGGCGATTCT CGTGGAAGAG GGGCAGACTG TGAACGCGGC GCAGTCTACG 

451 CCGACGATTG TCCAATTGGC GAATCTGGAT ATGATGTTGA ACAAAATGCA 

501 GATTGCCGAG GGCGATATTA CCAAGGTGAA GGCGGGGCAG GATATTTCGT 

551 TTACGATTTT GTCCGAACCG GATACGCCGA TTAAGGCGAA GCTCGACAGC 

601 GTCGACCCCG GGCTGACCAC GATGTCGTCG GGCGGTTACA ACAGCAGTAC 

651 GGATACGGCT TCCAATGCGG TCTACTATTA TGCCCGTTCG TTTGTGCCGA 

701 ATCCGGACGG CAAACTCGCC ACGGGGATGA CGACGCAGAA TACGGTTGAA 

751 ATCGACGGCG TGAAAAATGT GCTGATTATT CCGTCGCTGA CCGTGAAAAA 

801 TCGCGGCGGC AAGGCGTTTG TGCGCGTGTT GGGTGCGGAC GGCAAGGCGG 

851 CGGAACGCGA AATCCGGACC GGTATGAGAG ACAGTATGAA TACCGAAGTA 

901 AAAAGCGGGT TGAAAGAGGG GGACAAAGTG GTCATCTCCG AAATAACCGC 

951 CGCCGAGCAA CAGGAAAGCG GCGAACGCGC CCTAGGCGGC CCGCCGCGCC 

1001 GATAA 



This corresponds to the amino acid sequence <SEQ ID 768; ORF85-l>: 



1 . . VSVGAQASGQ IKILYVKLGQ QVKKGDLIAE INSTSQTNTL NTEKSKLETY 

51 QAKLVSAQIA LGSAEKKYKR .QAALWKENAT SKEDLESAQD AFAAAKANVA 

101 ELKALIRQSK ISINTAESEL GYTRITATMD GTWAILVEE GQTVNAAQST 

151 PTIVQLANLD MMLNKMQIAE GDITKVKAGQ DISFTCXSEP DTPIBCAKLDS 

201 VDPGLTTMSS GGYNSSTDTA SNAVYYYARS FVPNPDGKLA TGMTTQNTVE 

251 IDGVKNVLII PSLTVKNRGG KAFVRVLGAD GKAAEREIRT GMRDSMNTEV 

301 KSGLKEGDKV VISEITAAEQ QESGERALGG PPRR* 



1 



MAKMMKWAAV AAVAAAAVWG GWS.LKPEPH VLDITETVRR G 



51 
101 
151 
201 



I SFriLSEPDT 



Computer analysis of this amino acid sequence gave the following results: 
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Homology with a predicted ORF from N.meninpitidis (strain A) 

ORF85 shows 87.8% identity over a 41aa overlap and 99.3% identity over a 153aa overlap with 
an ORF (ORF85a) from strain A of AT. meningitidis: 

10 20 30 40 

orf 85 . pep MAKMMKWAAVAAVAAAAVWGGW S -LKPE PHVLDITETVRRG 

I 1 I I I 1 I I I I I I I I 1 1 t I 1 I 1 1 I Mill:: I I I I 1 1 1 1 

orf 85a MAKMMKWAAVAAVAAAAVWGGWS YLKPE PQAAY ITETVRRGDI SRTVS ATGE IS PSNLVS 

10 20 30 40 50 60 

// 

80 90 100 

orf 85. pep ISFTILSEPDTPIKAKLDSVDPGLTTMSSG 

I I I I I I I I I I I I I I I I I II I I I I I I I I I ! I 
orf 85a TIVQIANLDMMLNKMQIAEGDITKVKAGQDISFTILSEPDTPIKAKLDSVDPGLTTMSSG 
210 220 230 240 250 260 

110 120 130 140 150 160 

orf 85 .pep G YNS ST DTASNAVYYYARS FVPN P DGKLATGMTTQNT VE I DGVKNVLI I PSLTVKNRGGK 
MM II MM III III II I II 111 1 1 tilt II I 111 III lit llllll II I II III III: 
orf 85a GYNS S T DT AS NAVY YY AR S FV PN P DGKLATGMTTQNT VE I D GVKNVL 1 1 PSLTVKNRGGR 

270 280 290 300 310 320 

170 180 190 200 210 220 

or f 8 5 . pep AFVRVLGADGKAAEREIRTGMRDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGGP 

II I I I I I I II I I i I I I I I I I I I II I II I I I I I II I I I I I II M I M I I I II I M I 1 M 11 
orf 85a AFVRVLGADGKAAEREIRTGMRDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGGP 

330 340 350 360 370 380 

230 

orf 8 5. pep PRRX 

mi 

orf85a PRRX 
390 

The complete length ORF85a nucleotide sequence <SEQ ID 769> is: 

1 ATGGCAAAAA TGATGAAATG GGCGGCTGTT GCGGCGGTCG CGGCGGCAGC 

51 GGTTTGGGGC GGATGGTCTT ATCTGAAGCC CGAGCCGCAG GCTGCTTATA 

101 TTACGGAAAC GGTCAGGCGC GGCGACATCA GCCGGACGGT TTCTGCAACA 

151 GGGGAGATTT CGCCGTCCAA CCTGGTATCG GTCGGCGCGC AGGCATCGGG 

201 GCAGATTAAG AAACTTTATG TCAAACTCGG GCAACAGGTT AAAAAGGGCG 

251 ATTTGATTGC GGAAATCAAT TCGACCTCGC AGACCAATAC GCTCAATACG 

301 GAAAAATCCA AATTGGAAAC GTATCAGGCG AAGCTGGTGT CGGCACAGAT 

351 TGCATTGGGC AGCGCGGAGA AGAAATATAA GCGTCAGGCG GCGTTGTGGA 

401 AGGATGATGC GACCGCTAAA GAAGATTTGG AAAGCGCACA GGATGCGCTT 

451 GCCGCCGCCA AAGCCAATGT TGCCGAGCTG AAGGCTCTAA TCAGACAGAG 

501 CAAAATTTCC ATCAATACCG CCGAGTCGGA ATTGGGCTAC ACGCGCATTA 

551 CCGCAACGAT GGACGGCACG GTGGTGGCGA TTCTCGTGGA AGAGGGGCAG 

601 ACTGTGAACG CGGCGCAGTC TACGCCGACG ATTGTCCAAT TGGCGAATCT 

651 GGATATGATG TTGAACAAAA TGCAGATTGC CGAGGGCGAT ATTACCAAGG 

701 TGAAGGCGGG GCAGGATATT TCGTTTACGA TTTTGTCCGA ACCGGATACG 

751 CCGATTAAGG CGAAGCTCGA CAGCGTCGAC CCCGGGCTGA CCACGATGTC 

801 GTCGGGCGGC TACAACAGCA GTACGGATAC GGCTTCCAAT GCGGTCTACT 

851 ATTATGCCCG TTCGTTTGTG CCGAATCCGG ACGGCAAACT CGCCACGGGG 

901 ATGACGACGC AGAATACGGT TGAAATCGAC GGTGTGAAAA ATGTGCTGAT 

951 TATTCCGTCG CTGACCGTGA AAAATCGCGG CGGCAGGGCG TTTGTGCGCG 

1001 TGTTGGGTGC AGACGGCAAG GCGGCGGAAC GCGAAATCCG GACCGGTATG 

1051 AGAGACAGTA TGAATACCGA AGTAAAAAGC GGGTTGAAAG AGGGGGACAA 

1101 AGTGGTCATC TCCGAAATAA CCGCCGCCGA GCAGCAGGAA AGCGGCGAAC 

. 1151 GCGCCCTAGG CGGCCCGCCG CGCCGATAA 

This encodes a protein having amino acid sequence <SEQ ID 770>: 

1 MAKMMKWAAV AAVAAAA VWG GWSYLKPEPQ AAYITETVRR GDISRTVSAT 

51 GEISPSNLVS VGAQASGQIK KLYVKLGQQV KKGDLIAEIN STSQTNTLNT 

101 EKSKLETYQA KLVSAQIALG SAEKKYKRQA ALWKDDATAK EDLESAQDAL 

151 AAAKANVAEL KALIRQSKIS INTAESELGY TRITATMDGT WAILVEEGQ 

201 TVNAAQSTPT IVQLANLDMM LNKMQIAEGD ITKVKAGQDI SFTILSEPDT 

251 PIKAKLDSVD PGLTTMSSGG YNSSTDTASN AVYYYARSFV PNPDGKLATG 

301 MTTQNTVEID GVKNVLIIPS LTVKNRGGRA FVRVLGADGK AAEREIRTGM 
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15 



20 



25 



30 



35 



40 



351 RDSMNTEVKS GLKEGDKWI SEITAAEQQE SGERALGGPP RR* 

ORP85a and ORF85-1 show 98.2% identity in 334 aa overlap: 

30 40 50 60 70 80 

or f 8 5a . pep PQAAYITETVRRGDISRTVSATGEISPSNLVSVGAQASGQIKKLYVKLGQQVKKGDLIAE 

I I II i I I I I II I I 1 I M I I I I I I I I I I I I 
orf 8 5-1 VSVGAQASGQIKI LYVKLGQQVKKGDLIAE 

10 20 30 

90 100 110 120 130 140 

or f 8 5a . pep INSTSQTNTLNTEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKDDATAKEDLESAQD 
~ * II INI I II MM III llillll MM Mill II II Mil I I MM:: I 1:11111! Ill 
or f 8 5-1 INSTSQTNTLNTEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKENATSKEDLESAQD 

40 50 ' 60 70 80 90 

150 160 170 180 190 200 

or f 8 5a . pep ALAAAKANVAELKALIRQSKISINTAESELGYTRITATMDGTWAILVEEGQTVNAAQST 

I : I i II I I I I II II II I I I I I II M I I II M I II II M I I I 1 II I II I II 11 I M II I II 
orf85-l AFAAAKANVAELKALIRQSKISINTAESELGYTRITATMDGTWAILVEEGQTVNAAQST 

100 110 120 130 140 150 

210 220 230 240 250 260 

orf 85a . pep PTIVQLANLDMMLNKMQIAEGDITKVKAGQDISFTILSEPDTPIKAKLDSVDPGLTTMSS 

II M M I I I! I I I M I I I It II I I I I I M M I II II I I I I II II I I II II I II I I II I I I 
orf 85-1 PTIVQLANLDMMLNKMQIAEGDITKVKAGQDISFTILSEPDTPIKAKLDSVDPGLTTMSS 

160 170 180 190 200 210 

270 280 290 300 310 320 

orf 85a. pep GGYNSSTDTASNAVYYYARSFVPNPDGKLATGMTTQNTVEIDGVKNVLI I PSLTVKNRGG 
II II II II I I II I I I II II I I 11 II I I I I M I 1 II II I I I II I M II I M M II II 1 M I 
orf 85-1 GGYNSSTDTASNAVYYYARSFVPNPDGKLATGMTTQNTVE IDGVKNVLI IPSLTVKNRGG 

220 230 240 250 260 270 

330 340 350 360 370 380 

or f 85a . pep RAFVRVLGADGKAAEREIRTGMRDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGG 
i E f f 1 1 t 1 I I I 1 1 I I I t I I E I I I I I 1 I I I 1 I I I 1 t I 1 I 1 I t I I I I i i E I I 1 I I I t i 1 1 t I 
orf 85-1 KAFVRVLGADGKAAEREIRTGMRDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGG 
280 290 300 310 320 330 



390 
PPRRX 
I I II I 
PPRRX 



45 



50 



55 



60 



65 



orf 85a. pep 
orf85-l 

Figure 19D shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF85a.. 
Homology with a predicted ORF from Ksonorrhoeae 

ORF85 shows a high degree of identity with a predicted ORF (ORF85ng) from N. gonorrhoeae: 



ORF85 
0RF85ng 

ORF85 

ORF85ng 

ORF85 

ORF85ng 

ORF85 

ORF85ng 

0RF85 

ORF85ng 



1 MAKMMKWAAVAAVAAAAVWGGWS . LKPEPHVLDITETVRRG . . 40 

M I I I I I I ! It I II I I i II ! II I Mill:: IIMllil 
1 MAKMMKWAAVAAVAAAAVWGGWSYLKPEPQAAYITEAVRRGDISRTVSAT 50 



201 



ISFTILSEPDT 250 

I I I 1 I K I f t I I 

TVNAAQSTPTIVQLANLDMMLNKMQIAEGDITKVBCAGQDISFTILSEPDT 250 



251 PIKAKLDSVDPGLTTMSSGGYNSSTDTASNAVYYYARSFVPNPDGKLATG 300 

I I I II I M II I I 11 I I II I M II II II II I II I M I II I I M 11 I I M II 
251 PIKAKLDSVDPGLTTMSSGGYNSSTDTASNAVYYYARSFVPNPDGKLATG 300 

301 MTTQNTVEI DGVKNVL 1 1 PSLTVKNRGGKAFVRVLGADGKAAEREIRTGM 350 

M I M I M I M I I I II : 1 1 II II M I II M II I M I II M I MM MM 
301 MTTQNT VE I DGVKNVLLI PS LT VKNRGGKAFVRVLGADGKAVERE IRTGM 350 

152 RDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGGPPRR 393 

: 11 1 I I 1 1 M I I II I M II I M 1 M M II II II II I I II I I I 
351 KDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGGPPRR 393 
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The complete length ORF85ng nucleotide sequence <SEQ ID 771> is: 

1 ATGGCAAAAA TGATGAAATG GGCGGCTGTT GCGGCGGTCG CGGCGGCaac 

51 GGTTTGGGGC GGATGGTCTT ATCTGAAGCC CGAACCGCAG GCTGCTTATA 

101 TTACGGAaac ggTCAGGCGC GGCGATATCA GCCGGACGGT TTCCGCGACG 

151 GgcgAGATTT CGCCGTCCAA CCTGGTATCG GTCGGCGCGC AGGCTTCGGG 

201 GCAGATTAAA AAGCTTTATG TCAAACTCGG GCAACAGGTC AAAAAGGGCG 

251 ATTTGATTGC GGAAATCAAT TCGACCACGC AGACCAACAC GATCGATATG 

., 301 GAAAAATCCA AATTGGAAAC GTATCAGGCG AAGCTGGTGT CGGCACAGAT 

351 TGCATTGGGC AGCGCGGAGA AGAAATATAA GCGTCAGGCG GCGTTGTGGA 

401 AGGATGATGC GACCTCTAAA GAAGATTTGG AAAGCGCGCA GGATGCGCTT 

451 GCCGCCGCCA AAGCCAATGT TGCCGAGTTG AAGGCTTTAA TCAGACAGAG 

501 CAAAATTTCC ATCAATACCG CCGAGTCGGA TTTGGGCTAC ACGCGCATTA 

551 CCGCGACGAT GGACGGCACG GTGGTGGCGA TTCCCGTGGA AGAGGGGCAG 

601 ACTGTGAACG CGGCGCAGTC TACGCCGACG ATTGTCCAAT TGGCGAATCT 

651 GGATATGATG TTGAACAAAA TGCAGATTGC CGAGGGCGAT ATTACCAAGG 

701 TGAAGGCGGG GCAGGATATT TCGTTTACGA TTTTGTCCGA ACCGGATACG 

751 CCGATTAAGG CGAAGCTCGA CAGCGTCGAC CCCGGGCTGA CCACGATGTC 

801 GTCGGGCGGC TACAACAGCA GTACGGATAC GGCTTCCAAT GCGGTCTATT 

851 ATTATGCCCG TTCGTTTGTG CCGAATCCGG ACGGCAAACT CGCCACGGGG 

901 ATGACGACGC AGAATACGGT TGAAATCGAC GGTGTGAAAA ATGTGTTGCT 

951 TATTCCGTCG CTGACCGTGA AAAATCGCGG CGGCAAGGCG TTCGTACGCG 

1001 TGTTGGGTGC GGACGGCAAG GCAGTGGAAC GCGAAATCCG GACCGGTATG 

1051 AAAGACAGTA TGAATACCGA AGTGAAAAGC GGGTTGAAAG AGGGGGACAA 

1101 AGTGGTCATC TCCGAAATAA CCGCCGCCGA GCAGCAGGAA AGCGGCGAAC 

.1151 GCGCCCTAGG CGGCCCGCCG CGCCGATAA 

This encodes a protein having amino acid sequence <SEQ ID 772>: 



1 MAKMMKWAAV AAVAAAA VWG GWSYLKPEPQ AAYITEAV RR GD ISRTVSAT 

51 GEISPSNLVS VGAQASGQIK KLYVKLGQQV KKGDLIAEIN STTQTNTIDM 

101 EKSKLETYQA KLVSAQIALG SAEKKYKRQA ALWKDDATSK EDLESAQDAL 

151 AAAKANVAEL KALIRQSKIS INTAESDLGY TRITATMDGT WAIPVEEGQ 

201 TVNAAQSTPT IVQLANLDMM LNKMQIAEGD ITKVKAGQDI SFTILSEPDT 

251 PIKAKLDSVD PGLTTMSSGG YNSSTDTASN AVYYYARSFV PNPDGKLATG 

301 MTTQNTVEID GVKNVLLIPS LTVKNRGGKA FVRVLGADGK AVEREIRTGM 

351 KDSMNTEVKS GLKEGDKWI SEITAAEQQE SGERALGGPP RR* 

ORF85ng and ORF85-1 show 96.1% identity in 334 aa overlap: 

30 40 50 60 70 80 

orf85ng PQAAYI TET VRRG DI SRTVS ATGEI S PSN LVS VGAQAS GQIKKLYVKLGQQVKKGDLI AE 

I I I I I I I I I I I I I I II I I I I I I I I I I I I I 
orf85-l VSVGAQASGQIKILYVKLGQQVKKGDLIAE 

10 20 30 



. 90 100 110 120 130 140 

orf85ng INSTTQTNTIDMEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKDDATSKEDLESAQD 
1111:1111:: I i I I I I 1 II I I I I I I I I I I I I I I I I I I I I I I I I I :: I M I I I I I I I I I 
orf85-l INSTSQTNTLNTEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKENATSKEDLESAQD 

40 50 60 70 80 90 

150 160 170 180 190 200 

orf85ng AIJU\AKANVAELKALIRQSKISINTAESDLGYTRITATMDGTWAIPVEEGQTVNAAQST 
I: I II MM I I II II I MM I Ml 1111:1 till Mill III MM MM MUM III 
orf85-l AFAAAKANVAELKALIRQSKI SINTAESELGYTRITATMDGTWAI LVEEGQTVNAAQST 

100 110 120 130 140 150 

210 220 230 240 250 260 

orf85ng PTIVQLANLDMMLNKMQIAEGDITKVKAGQDISFTILSEPDTPIKAKLDSVDPGLTTMSS 
IK I 1 1 I I M M 1 1 I 1 1 I I I It 1 I I 1 1 I ! 1 1 M M I I M I I I I M I I II I I t I I M I I 1 1 I 
orf85-l PTIVQLANLDMMLNKMQIAEGDITKVKAGQDISFTILSEPDTPIKAKLDSVDPGLTTMSS 
160 170 180 190 200 210 

270 280 290 300 310 320 

or f 85ng GGYNS ST DTASNAVYYYARS FVPN P DGKLATGMTTQNTVEI DGVKNVLLI PSLTVKNRGG 

II II M I I I II I II I I I II I II II I II I I I II I II II II II 1 I I I II I : I II I II II I II 
orf85-l GGYNSSTDTASNAVYYYARSFVPNPDGKLATGMTTQNTVEIDGVKNVLI I PSLTVKNRGG 

220 230 240 250 260 270 



330 340 350 360 370 380 
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orf85ng KAFVRVLGADGKAVEREIRTGMKDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGG 
I I I I I I I i I I t I I : I I I I I I I t : 1 I I I I I I I I I I I I I i I I I I I I I I I t I I I I I I I M I I ! 
orf85-l KAFVRVLGADGKAAEREIRTGMRDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGG 
280 290 300 310 320 330 



orf 85ng 
orf85-l 



390 
PPRRX 
I I I I I 
PPRRX 



In addition, ORF85ng shows significant homology to an E.coli membrane fusion protein: 

gi | 1787104 (AE000189) o380; 27% identical (27 gaps) to 332 residues from 
membrane fusion protein precursor, MTRC_NEIGO SW: P43505 (412 aa) [Escherichia 
coli] Length = 380 
Score = 193 bits (485), Expect - 2e-48 

Identities - 120/345 (34%), Positives = 182/345 (51%), Gaps = 13/345 (3%) 

Query: 29 PQAAYITETVRRGDI SRTVSATGE I SPSNLVSVGAQASGQIKKLYVKLGQQVKKGDLIAE 88 

P Y T VR GD+ ++V ATG++ V VGAQ SGQ+K L V +G +VKK L+ 

Sbjct: 41 PVPTYQTLIVRPGDLQQSVLATGKLDALRKVDVGAQVSGQLKTLSVAIGDKVKKDQLLGV 100 

Query: 89 INSTTQTNTIDMEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKDDATSKEXXXXXXX 148 

1+ N I ++ L +A+ A+ L A Y RQ L + A S++ 

Sbjct: 101 IDPEQAENQIKEVEATLMELRAQRQQAEAELKLARVTYSRQQRLAQTKAVSQQDLDTAAT 160 

Query: 149 XXXXXXXXXXXXXXXIRQSKI S INTAE S DLGYTRI TATMDGTWAI PVEEGQTVNAAQST 208 

I++++ S++TA+++L YTRI A M G V I +GQTV AAQ 
Sbjct: 161 EMAVKQAQIGT I DAQIKRNQASLDTAKTNLDYTRIVAPMAGEVTQITTLQGQTVI AAQQA 220 

Query: 209 PTIVQIANLDMMLNKMQIAEGDITKVKAGQDISFTILSEPDTPIKAKLDSVDPGLTTMSS 268 j 
P 1+ LA++ ML K Q++E D+ +K GQ FT+L +P T + ++ VP 1 
Sbjct: 221 PNILTLADMSAMLVKAQVSEADVIHLKPGQKAWFTVLGDPLTRYEGQIKDVLP 273 

Query: 269 GGYNSSTDTASNAVYYYARSFVPNPDGKLATGMTTQNTVEIDGVKNVLLIPSLTVKNRGG 328 

+ + ++A4-+YYAR VPNP+G L MT Q +++ VKNVL IP + + G 
Sbjct: 274 T PEKVN DAI FY YARFE V PN PNG LLRLDMT AQVH I QLT D VKNVLT I PL S ALGDP VG 328 

Query: 329 KAFVRV-LGADGKAVEREIRTGMKDSMNTEVKSGLKEGDKWISE 372 

+V L +G+ ERE+ G ++ + E+ GL+ GD+WI E 
Sbjct: 329 DNRYKVKLLRNGETREREVTIGARNDTDVEIVKGLEAGDEWIGE 373 

Based on this analysis, it was predicted that the proteins from N.meningitidis and K gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



ORF85-1 (40.4kDa) was cloned in the pGex vectors and expressed in Rcoli, as described above. 
The products of protein expression and purification were analyzed by SDS-PAGE. Figure 19A 
shows the results of affinity purification of the GST-fusion protein. Purified GST-fusion protein 
was used to immunise mice, whose sera were used for Western blot (Figure 19B), FACS analysis 
(Figure 19C), and ELISA (positive result). These experiments confirm that ORF85-1 is a 
surface-exposed protein, and that it is a useful immunogen. 



Example 92 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 773>: 

1 . .ATTCCCGCCA CGATGACATT TGAACGCAGC GGCAATGCTT ACAAAATCGT 

51 TTCGACGATT AAAGTGCCGC TATACAATAT CCGTTTCGAG TCCGGCGGTA 

101 CGGTTGTCGG CAATACCCTG CACCCTACCT ACTATAGAGA CATACGCAGG 

151 GGCAAACTGT ATGCGGAAgc CAAATTCGCC GACgGcAGCG TAACTTACGG 

201 CAAAGCGGGC GAGAGCAAAA CCGAGCAAAG CCCCAAGGCT ATGGATTTGT 
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251 TCACGCTTGC CTGGCAGTTG GCGGCAAATG ACGCGAAACT CCCCCCGGGG 
301 CTGAAAATCA CCAACGGCAA AAAACTTTAT TCCGTCGGCG GTTTGAATAA 
351 GGCGGGTACA GGAAAATACA GCATAGGCGG CGTGGAAACC GAAGTCGTCA 
401 AATATCGGGT GCGGCGCGGC GACGATGCGG TAATGTATTT cTTCGCACCG 
451 TCCCTGAACA ATATTCCGGC ACAAATCGGC TATACCGACG ACGGCAAAAC 
501 CTATACGCTG AAACTCAAAT CGGTGCAGAT CAACGGCCAG ■ GCAGCCAAAC 
551 CGTAA 

This corresponds to the amino acid sequence <SEQ ID 774; 0RF12O: 

1 . . IPAIMTFERS GNAYKIVSTI KVPLYNIRFE SGGTWGNTL HPTYYRDIRR 

51 GKLYAEAKFA DGSVTYGKAG ESKTEQSPKA MDLFTLAWQL AANDAKLPPG 

101 LKITNGKKLY SVGGLNKAGT GKYSIGGVET EWKYRVRRG DDAVMYFFAP 

151 SLNNIPAQIG YTDDGKTYTL KLKSVQINGQ AAKP* 

Further work revealed the complete nucleotide sequence <SEQ ID 775>: 

1 ATGATGAAGA CTTTTAAAAA TATATTTTCC GCCGCCATTT TGTCCGCCGC 

51 CCTGCCGTGC GCGTATGCGG CAGGGCTGCC CCAATCCGCC GTGCTGCACT 

101 ATTCCGGCAG CTACGGCATT CCCGCCACGA TGACATTTGA ACGCAGCGGC 

151 AATGCTTACA AAATCGTTTC GACGATTAAA GTGCCGCTAT ACAATATCCG 

201 TTTCGAGTCC GGCGGTACGG TTGTCGGCAA TACCCTGCAC CCTACCTACT 

251 ATAGAGACAT ACGCAGGGGC AAACTGTATG CGGAAGCCAA ATTCGCCGAC 

301 GGCAGCGTAA CTTACGGCAA AGCGGGCGAG AGCAAAACCG AGCAAAGCCC 

351 CAAGGCTATG GATTTGTTCA CGCTTGCCTG GCAGTTGGCG GCAAATGACG 

401 CGAAACTCCC CCCGGGGCTG AAAATCACCA ACGGCAAAAA ACTTTATTCC 

451 GTCGGCGGTT TGAATAAGGC GGGTACAGGA AAATACAGCA TAGGCGGCGT 

501 GGAAACCGAA GTCGTCAAAT ATCGGGTGCG GCGCGGCGAC GATGCGGTAA 

551 TGTATTTCTT CGCACCGTCC CTGAACAATA TTCCGGCACA AATCGGCTAT 

601 ACCGACGACG GCAAAACCTA TACGCTGAAA CTCAAATCGG TGCAGATCAA 

651 CGGCCAGGCA GCCAAACCGT AA 

This corresponds to the amino acid sequence <SEQ ID 776; ORF120-1>: 

1 MMKTFKNIFS AAILSAALPC AYAA GLPQSA VLHYSGSYGI PA2MTFERSG 

51 NAYKIVSTIK VPLYNIRFES GGTWGNTLH PTYYRDIRRG KLYAEAKFAD 

101 GSVTYGKAGE SKTEQSPKAM DLFTLAWQLA ANDAKLPPGL KITNGKKLYS 

151 VGGLNKAGTG KYSIGGVETE WKYRVRRGD DAVMYFFAPS LNNIPAQIGY 

201 TDDGKTYTLK LKSVQINGQA AKP* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.meninpitidis (strain A) 

ORF120 shows 92.4% identity over a 184aa overlap with an ORF (ORF120a) from strain A of N. 
meningitidis: 

10 20 30 

or f 120. pep I PATMTFERSGNAYKIVSTI KVPLYNIRFE 

MM: II IMIIIIIIIIIIIII 

orfl20a SAAILSAALPCAYAAGLPXSAVLHYSGSYGIPATXXXXXXXNAXKIVSTIKVPLYNIRFE 
10 20 30 40 50 60 

40 50 60 70 80 90 

orfl20.pep SGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAMDLFTLAWQL 
M I I I I M II I M I I 1 1 I II M i M M I I II II I I II M : I I I I ! I M I I I I I I I 
orfl20a SGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAXXXXXXQSPKAMDLFTLAWQL 
70 80 90 100 110 120 

100 110 120 130 140 150 

orf 120 . pep AANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGDDAVMYFFAP 
M III Mill MMIMMMM Ml IIIIMIIIM III II j Ml IIIMM MM Ml 
orf 120a AANDAKLPPGLKITNGKKLYSVGGLNKAGT GKYSIGGVET EWKYRVRRG DDAVMYFFAP 

130 140 150 160 170 180 
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160 170 180 

orf 120 . pep S LNN I PAQ IGYTDDGKTYT LPCLKSVQINGQAAKPX 
I I I I I 1 I I I I I t I I I I I I I 1 I I I I 1 I t t I 1 I 1 I I 1 
orf 120a SLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKPX 
190 200 210 220 

The complete length ORF120a nucleotide sequence <SEQ ID 777> is: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 



ATGATGAAGA 
CCTGCCGTGC 
ATTCCGGCAG 
AATGCTTNCA 
TTTCGAGTCC 
ATAGAGACAT 
GGCAGCGTAA 
CAAGGCTATG 
CGAAACTCCC 
GTCGGCGGTT 
GGAAACCGAA 
TGTATTTCTT 
ACCGACGACG 
CGGCCAGGCA 



CTTTTAAAAA 
GCGTATGCGG 
CTACGGCATT 
AAATCGTTTC 
GGCGGTACGG 
ACGCAGGGGC 
CCTACGGCAA 
GATTTGTTCA 
CCCGGGGCTG 
TGAATAAGGC 
GTCGTCAAAT 
CGCACCGTCC 
GCAAAACCTA 
GCCAAACCGT 



TATATTTTCC 
CAGGGCTGCC 
CCCGCCACNA 
GACGATTAAA 
TTGTCGGCAA 
AAACTGTATG 
AGCGGNNNNN 
CGCTTGCNTG 
AAAATCACCA 
GGGTACAGGA 
ATCGGGTGCG 
CTGAACAATA 
TACGCTGAAA 
AA 



GCCGCCATTT 
CNAATCCGCC 
NNANNTNNGN 
GTGCCGCTAT 
TACCCTGCAC 
CGGAAGCCAA 
ANCNNNNNNG 
GCAGTTGGCG 
ACGGCAAAAA 
AAATACAGCA 
GCGCGGCGAC 
TTCCGGCACA 
CTCAAATCGG 



TGTCCGCCGC 
GTGCTGCACT 
ACNNNGNGNC 
ACAATATCCG 
CCTACCTACT 
ATTCGCCGAC 
NGCAAAGCCC 
GCAAATGACG 
ACTTTATTCC 
TAGGCGGCGT 
GATGCGGTAA 
AATCGGCTAT 
TGCAGATCAA 



This encodes a protein having amino acid sequence <SEQ ID 778>: 

1 MMKTFKNIFS AAILSAALPC AYAA GLPXSA VLHYSGSYGI PATXXXXXXX \ 

51 NAXKIVSTIK VPLYNIRFES GGTWGNTLH PTYYRDIRRG KLYAEAKFAD 

101 GSVTYGKAXX XXXXQSPKAM DLFTLAWQLA ANDAKLPPGL KITNGKKLYS 

151 VGGLNKAGTG KYSIGGVETE WKYRVRRGD DAVMYFFAPS LNNIPAQIGY 

201 TDDGKTYTLK LKSVQINGQA AKP* 

ORF120a and ORF120-1 show 93.3% identity in 223 aa overlap: 

10 20 30 40 50 60 

orf 120a . pep MMKT FKN I FS AAI L S AALPCAYAAGLPXS AVLHYSGS YGI PATXXXXXXXNAXKI VST IK 
I I 1 I I I I I I I I i 1 I I I I I I I I I 1 I I I I I I J I I I I I I I I 1 1 I 1 : il 1 1 I 1 I i I 

orf 120-1 MMKT FKN I FS AAI LSAALPCAYAAGLPQSAVLHYSGS YGI PATMTFERSGNAYKIVST IK 

10 20 30 40 50 60 

70 80 90 100 110 120 

or f 120a . pep VPLYNIRFESGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAXXXXXXQSPKAM 
I I I 1 I I I I I II I II I I I i I I I ! I I I I I I I I I I I I I M I I I I ! I I I I I I : I I II I I 
orf 120-1 VPLYNIRFESGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAM 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 120a . pep DLFTLAWQLAANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGD 
I I I I I I I I I I I II I I I I I I II I I I I II I I I I II I I I I I I I I M I I M I I 1 I I I I I I I I I I 
orfl20-l DLFTLAWQLAANDAKL P PGLKITNGKKLYSVGGLNKAGTGKYS IGGVETEWKYRVRRGD 

130 140 150 160 170 180 

190 200 210 220 

orf 120a . pep DAVMYFFAPSLNN I PAQ IGYTDDGKTYTLKLKS VQINGQAAKPX 

mi in iii mil i ii MiitiiMi i m i mm mi in 

orf 120-1 DAVMYFFAPS LNN I PAQIGYTDDGKTYTLKLKS VQINGQAAKPX 

190 200 210 220 



Homology with a predicted ORF from N. gonorrhoeae 

ORF120 shows 97.8% identity over 184 aa overlap with a predicted ORF (ORF120ng) from 



Kgonorrhoeae: 

orf 120. pep 
orfl20ng 
orf 120. pep 
orfl20ng 



I PATMTFERSGNAYKIVST IKVPLYNIRFE 
I I I I I I I I I M I I I I I I II I I I II II III I 
SAAILSAALPCAYAARLPQSAVLHYSGSYGIPATMTFERSGNAYKIVSTIKVPLYNIRFE 



30 



69 



90 



SGGT WGNT LHPTYYRDIRRGKLYAEAKFADGS VT YGKAGE SKTEQS PKAMDLFT LAWQL 
Mill IIMMMI MM I I II I Mill III MM I MMMI M Ml MIMMIMM 
SGGTWGNTLHPAYYKDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAMDLFTLAWQL 129 
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orf 120 . pep AANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGDDAVMYFFAP 150 

I I I I I I I ft I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I 11 I I I I I I I I I I |: I I I I I I 
orf!20ng AANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEVVKYRVRRGDDTVTYFFAP 189 

orf 120. pep SLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKP 184 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 

orfl20ng SLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKP 223 

The complete length ORF120ng nucleotide sequence <SEQ ID 779> is: 



10 



15 



20 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 



ATGATGAAGA 
CCTGCCGTGC 
ATTCCGGCAG 
AATGCTTACA 
TTTCGAATCC 
ATAAAGACAT 
GGCAGCGTAA 
CAAGGCTATG 
CGAAACTCCC 
GTCGGCGGCC 
GGAAACCGAA 
CGTATTTCTT 
ACCGACGACG 
CGGACAGGCC 



CTTTTAAAAA 
GCGTATGCGG 
CTACGGCATT 
AAATCGTTTC 
GGCGGTACGG 
ACGCAGGGGC 
CCTACGGCAA 
GATTTGTTCA 
CCCGGGTCTG 
TGAATAAGGC 
GTCGTCAAAT 
CGCACCGTCC 
GCAAAACCTA 
GCCAAACCGT 



TATATTTTCC 
CAAGGCTACC 
CCCGCCACGA 
GACGATTAAA 
TTGTCGGCAA 
AAACTGTATG 
AGCGGGCGAG 
CGCTTGCCTG 
AAAATCACCA 
GGGTACGGGA 
ATCGGGTGCG 
CTGAACAATA 
TACGCTGAAG 
AA 



GCCGCCATTT 
CCAATCCGCC 
TGACATTTGA 
GTGCCGCTAT 
TACCCTGCAC 
CGGAAGCCAA 
AGCAAAACCG 
GCAGTTGGCG 
ACGGCAAAAA 
AAATACAGCA 
GCGCGGCGAC 
TTCCGGCACA 
CTCAAATCGG 



TGTCCGCCGC 
GTGCTGCACT 
ACGCAGCGGC 
ACAATATGCG 
CCTGCCTACT 
ATTCGCGGAC 
AGCAAAGCCC 
GCAAATGACG 
ACTTTATTCC 
TaggCGGCGT 
GATACGGTAA 
AATCGGCTAT 
TGCAGATCAA 



25 



This encodes a protein having amino acid sequence <SEQ ID 780>: 

1 MMKTFKNIFS AAILSAALPC AYA ARLPQSA VLHYSGSYGI PATMTFERSG 

51 NAYKIVSTIK VPLYNIRFES GGTWGNTLH PAYYKDIRRG KLYAEAKFAD 

101 GSVTYGKAGE SKTEQSPKAM DLFTLAWQLA ANDAKLPPGL KITNGKKLYS 

151 VGGLNKAGTG KYSIGGVETE WKYRVRRGD DTVTYFFAPS LNNIPAQIGY 

201 TDDGKTYTLK LKSVQINGQA AKP* 

30 In comparison with ORF120-1, ORF120ng shows 97.8% identity in 223 aa overlap: 

10 20 30 40 50 60 

orf 120-1 . pep MMKTFKNIFSAAILSAALPCAYAAGLPQSAVLHYSGSYGIPATMTFERSGNAYKIVSTIK 
Hi I MM II I IN M III III II M II MNII I Ml M I II III I MINIM Ml I 
orfl20ng MMKTFKNIFSAAILSAALPCAYAARLPQSAVLHYSGSYGIPATMTFERSGNAYKIVSTIK 

10 20 30 40 50 60 



35 



40 



70 80 90 100 110 120 

orf 120-1 . pep VPLYNIRFESGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAM 
M I II II I II M I I I I I II I I : I I : II I I I I I I | | | | I || || | | | | | | || | | | | | | | || | 
orfl20ng VPLYNIRFESGGTWGNTLHPAYYKDIRRGKLYAEAKFADGSVTYGKAGE SKTEQSPKAM 

70 80 90 100 110 120 



45 



130 140 150 160 170 180 

or f 120-1 . pep DLETIAWQIAANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEVVKYRVRRGD 
I I M I I I I I M I I M I M I I II I II I I I I II I I I I II I I II M M II I I I I I II I I I I I I 
orfl20ng DLFT1AWQLAANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEVVKYRVRRGD 

130 140 150 160 170 180 



190 200 210 220 

DAVMYFFAPSLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKPX 
1:1 MIIIIMIMMMIIIIMIIMIMMMIMIMM 
DTVTYFFAPSLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKPX 
190 200 210 220 

This analysis, including the presence of a putative leader sequence in the gonococcal protein 
55 suggests that the proteins from Kmeningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



50 orf 120-1. pep 

orfl20ng 



Example 93 



The following partial DNA sequence was identified in ^meningitidis <SEQ ID 78 1>: 
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1 ATGTATCGGA GGAAAGGGCG GGGCATCAAG CCGTGGATGG GTGCCGGTGC 

51 . GCGTTTGCC GCCTTGGTCT GGCTGGTTTT CGCGCTCGGC GATACTTTGA 

101 CTCCGTTTGC GGTTGCGGCG GTGCTGGCGT ATGTATTGGA CCCTTTGGTC 

151 GAATGGTTGC AGAAAAAGGG TTTGAACCGT GCATCCGCTT CGATGTCTGT 

201 GATGGTGTTT TCCTTGATTT TGTTGTTGGC ATTATTGTTG ATTATCGTCC 

251 CTATGCTGGT CGGGCAGTTC AACAATTTGG CATCGCGCCT GCCCCAATTA 

301 ATCGGTTTTA TGCAGAACAC GCTGCTGCCG TGGTTGAAAA ATACAATCGG 

351 CGGATATGTG GAAATCGATC AGGCATCTAT TATTGCGTGG CTTCAGGCGC 

401 ATACGGGAGA GTTGAGCAAC GCGCTTAAGG CGTGGTTTCC CGTTTTGATG 

451 AGGCAGGGCG GCAATATT. . 

This corresponds to the amino acid sequence <SEQ ID 782; ORF121>: 

1 MYRRKGRGIK PWMGAGXAFA ALVWLVFALG DTLTPFAVAA VLAYVLDPLV 
51 EWLQKKGLNR ASASMSVMVF SLILLLALLL IIVPMLVGQF NNLASRLPQL 
101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW LQAHTGELSN ALKAWFPVLM 
151 RQGGNI. . 

Further work revealed the complete nucleotide sequence <SEQ ID 783>: 

1 ATGTATCGGA GGAAAGGGCG GGGCATCAAG CCGTGGATGG GTGCCGGTGC 

51 GGCGTTTGCC GCCTTGGTCT GGCTGGTTTT CGCGCTCGGC GATACTTTGA 

101 CTCCGTTTGC GGTTGCGGCG GTGCTGGCGT ATGTATTGGA CCCTTTGGTC 

151 GAATGGTTGC AGAAAAAGGG TTTGAACCGT GCATCCGCTT CGATGTCTGT 

201 GATGGTGTTT TCCTTGATTT TGTTGTTGGC ATTATTGTTG ATTATCGTCC 

251 CTATGCTGGT CGGGCAGTTC AACAATTTGG CATCGCGCCT GCCCCAATTA 

301 ATCGGTTTTA TGCAGAACAC GCTGCTGCCG TGGTTGAAAA ATACAATCGG 

351 CGGATATGTG GAAATCGATC AGGCATCTAT TATTGCGTGG CTTCAGGCGC 

401 ATACGGGAGA GTTGAGCAAC GCGCTTAAGG CGTGGTTTCC CGTTTTGATG 

451 AGGCAGGGCG GCAATATTGT CAGCAGTATC GGCAACCTGC TGCTGCTTCC 

501 CTTGCTGCTT TACTATTTCC TGCTGGATTG GCAGCGGTGG TCGTGCGGCA j 

551 TTGCCAAACT GGTTCCGAgG CGTTTTGCCG GTGCTTATAC GCGCATTACA i 

601 GGCAATTTGA ACGAGGTATT GGGCGAATTT TTGCGCGGGC AGCTTCTGGT | 

651 AATGCTGATT ATGGGCTTGG TTTACGGTTT GGGATTGGTG CTGGTCGGGC 

701 TGGATTCGGG GTTTGCCATC GGTATGCTTG CCGGTATTTT GGTGTTTGTC 

751 CCTTATCTCG GGGCGTTTAC GGGATTGCTG CTTGCCACCG TCGCCGCCTT 

801 GCTCCAGTTC GGTTCGTGGA ACGGCATCCT ATCGGTTTGG GCGGTTTTTG 

851 CCGTAGGACA GTTTCTCGAA AGTTTTTTCA TTACGCCGAA AATCGTGGGA 

901 GACCGTATCG GGCTGTCGCC GTTTTGGGTT ATCTTTTCGC TGATGGCGTT 

951 CGGGCAGCTG ATGGGCTTTG TCGGAATGTT GGCGGGATTG CCTTTGGCCG 

1001 CCGTAACCTT GGTCTTGCTT CGCGAGGGCG TGCAGAAATA TTTTGCCGGC 

1051 AGTTTTTACC GGGGCAGGTA G 

This corresponds to the amino acid sequence <SEQ ID 784; ORF121-l>: 

1 MYRRKGRGIK PWMGAGAAFA ALVWLVFALG DT LTPFAVAA VLAYVLDPLV 
51 EWLQKKGLNR ASASMS VMVF SLILLLALLL IIV PMLVGQF NNLASRLPQL 
101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW LQAHTGELSN ALKAWFPVLM 
151 RQGGNIVS SI GNLLLLPLLL YYFLL DWQRW SCGIAKLVPR RFAGAYTRIT 
201 GNLNEVLGEF LRGQ LLVMLI MGLVYGLGLV LV GLDSGFAI GMLAGILVFV 
251 PYLGAFTGLL LAT VAALLQF GSWNG ILSVW AVFAVGQFLE SF FITPKIVG 
301 DRIGLSPFWV IFSLMAFGQL MG FVGMLAGL PLAAVTLVLL REGVQKYFAG 
351 SFYRGR* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.meninzitidis (strain A) 

ORF121 shows 98.7% identity over a 156aa overlap with an ORF (ORF121a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 121 . pep MYRRKGRGIKPWMGAGXAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 
I I I I i i I I I I I I I II I I t I L i I 1 1 I I i 1 1 1 1 1 I I i 1 1 K I I I t I I 1 I I 1 1 I } 1 1 1 1 I M 
orf 121a MYRRKGRG I KPWMDAGAAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 

10 20 30 40 50 60 



orf 121. pep 



70 80 90 100 110 120 

ASASMSVMVFSLILLIALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 
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MtlMIIII III IIIMMII! IMMIMIIIMI MM IMIIMIIMIIIIIIM 
AS ASMS VMVFS L I LLLALLLI I VPMLVGQFNNLAS RL PQLI G EWQNT LLPWLKNT IGG YV 
70 80 90 100 110 120 

130 140 150 

E I DQAS 1 1 AWLQAHTGELSNALKAWFPVLMRQGGN I 
Mill III III IMIIIMI II I II MIMIIIMI 

EIDQASIIAWLQAHTGELSNALKAWFPVLMRQGGNIVSSIGNLLLLPLLLYYFLLDWQRW 
130 140 150 160 170 180 

SCGIAKLVPRRFAGAYTRITGNLNEVLGEFLRGQLLVMLIMGLVYGLGLVLVGLDSGFAI 
190 200 210 220 230 240 

The complete length ORF121a nucleotide sequence <SEQ ID 785> is: 

1 ATGTATCGGA GGAAAGGGCG GGGCATCAAG CCGTGGATGG ATGCCGGTGC 

51 GGCGTTTGCC GCCTTGGTCT GGCTGGTTTT CGCGCTCGGC GATACTTTGA 

101 CTCCGTTTGC GGTTGCGGCG GTGCTGGCGT ATGTATTGGA CCCTTTGGTC 

151 GAATGGTTGC AGAAAAAGGG TTTGAACCGT GCATCCGCTT CGATGTCTGT 

201 GATGGTGTTT TCCTTGATTT TGTTGTTGGC ATTATTGTTG ATTATTGTCC 

251 CTATGCTGGT CGGGCAGTTC AACAATTTGG CATCGCGCCT GCCCCAATTA 

301 ATCGGTTTTA TGCAGAACAC GCTGCTGCCG TGGTTGAAAA ATACAATCGG 

351 CGGATATGTG GAAATCGATC AGGCATCTAT TATTGCGTGG CTTCAGGCGC 

401 ATACGGGCGA GTTGAGCAAC GCGCTTAAGG CGTGGTTTCC CGTTTTGATG 

451 AGGCAGGGCG GCAATATTGT CAGCAGTATC GGCAACCTGC TGCTGCTTCC 

501 CTTGCTGCTT TACTATTTCC TGCTGGATTG GCAGCGGTGG TCGTGCGGCA 

551 TTGCCAAACT GGTTCCGAGG CGTTTTGCCG GTGCTTATAC GCGCATTACA 

601 GGCAATTTGA ACGAGGTATT GGGCGAATTT TTGCGCGGGC AGCTTCTGGT 

651 GATGCTGATT ATGGGTTTGG TTTACGGCTT GGGGTTGGTG CTGGTCGGGC 

701 TGGATTCGGG GTTTGCAATC GGTATGGTTG CCGGTATTTT GGTTTTTGTT 

751 CCCTATTTGG GCGCGTTTAC AGGACTGCTG CTGGCAACCG TCGCCGCCTT 

801 GCTCCAGTTC GGTTCGTGGA ACGGCATCTT GGCTGTTTGG GCGGTTTTTG 

851 CCGTAGGACA GTTTCTCGAA AGTTTTTTCA TTACGCCGAA AATCGTGGGA 

901 GACCGTATCG GCCTGTCGCC GTTTTGGGTT ATCTTTTCGC TGATGGCGTT 

951 CGGGCAGCTG ATGGGCTTTG TCGGAATGTT GGCCGGATTG CCTTTGGCCG 

1001 CCGTAACCTT GGTCTTGCTT CGCGAGGGCG TGCAGAAATA TTTTGCCGGC 

1051 AGTTTTTACC GGGGCAGGTA G 

This encodes a protein having amino acid sequence <SEQ ID 786>: 

1 MYRRKGRGIK PWMDAGAAFA ALVWLVFALG DTL TPFAVAA VLAYVLDPLV 

51 EWLQKKGLNR ASASMS VMVF SLILLLALLL IIV PMLVGQF NNLASRLPQL 

101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW LQAHTGELSN ALKAWFPVLM 

151 RQGGNIVS SI GNLLLLPLLL YYFLL DWQRW SCGIAKLVPR RFAGAYTRIT 

201 GNLNEVLGEF LRGQ LLVMLI MGLVYGLGLV LV GLDSGFAI GMVAGILVFV 

251 PYLGAFTGLL LA TVAALLQF GSWN GILAVW AVFAVGQFLE SF FITPKIVG 

301 DRIGLSPFWV IFSLMAFGQL MG FVGMLAGL PLAAVTLVLL REGVQKYFAG 

351 SFYRGR* 

ORF121a and ORF121-1 show99.2% identity in 356 aa overlap: 

10 20 30 40 50 60 

orf 121a . pep MYRRKGRG I K PWM DAG AAFAALVWLVFALG DTLT P FAVAAVLAYVL D PL VEW LQKKG LNR 
MMIMMMM IMIIIIMMIIIIIIIIIIIIIIIIIIMlllllllllllllll 
or f 1 2 1 - 1 MYRRKGRG IKPWMGAGAAFAALVWLVFALGDTLT PFAVAAVLAYVLDPLVEWLQKKGLNR 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 121a . pep ASASMSVMVFSLI LLLALLLI I VPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 

I I I I I M M I M M I M M M I I I M 1 1 M II I I I I M I M M I I I I I I I I M 1 1 I I M I 
orf 121-1 ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 121a . pep EIDQASIIAWLQAHTGELSNALKAWFPVLMRQGGNIVSSIGNLLLLPLLLYYFLLDWQRW 

I I I 1 1 I I I I I M M I I 1 1 I M I II I M M I M I M I II I I M II 1 1 I I M 1 1 I li 1 1 1 1 1 
orf 12 1- 1 EI DQAS I IAWLQAHTGELSNALKAWFPVLMRQGGN I VSSIGNLLLLPLLL YYFLLDWQRW 

130 140 150 160 170 180 



orfl21a 

orf 121. pep 
orfl21a 

orf!21a 



orf 12 la. pep 



190 200 210 220 230 240 

SCGIAKLVPRRFAGAYTRITGNLNEVLGEFLRGQLLVMLIMGLVYGLGLVLVGLDSGFAI 
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IIM I MIIIIIIM I iMI Ml IMIIMI MIIIIMIIIII II I I IMIIII 1(111 
orf 121-1 SCGIAKLVPRRFAGAYTRITGNLNEVLGEFLRGQLLVMLIMGLVYGLGLVLVGLDSGFAI 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 121a pep GMVAGILVFVPYLGAFTGLLLATVAALLQFGSWNGILAVWAVFAVGQFLESFFITPKIVG 
| I : I 1 1 1 1 1 I I M I I I I II I M I 1 1 I I I I I I I I I I I 1 : 1 1 1 1 II 1 1 I I I 1 1 1 ! I I 1 1 I I I 
orf 121-1 GMLAGILVFVPYLGAFTGLLLATVAALLQFGSWNGILSVWAVFAVGQFLESFFITPKIVG 

250 260 270 280 290 300 

310 320 330 340 350 

orfl21a pep DRIGLS PFWVI FS LMAFGQLMGFVGMLAGLPLAAVTLVLLREGVQKYFAGSFYRGRX 
Ml Ml llll 1 1 MM INI III 1 1 MM III II II IMM 111) I II IMIIII II 
orf 121-1 DRIGLSPFWIFSLMAFGQLMGFVGMIjAGLPLAAVTLVLLREGVQKYFAGSFYRGRX 

310 320 330 340 350 

Homology with a predicted ORF from N. gonorrhoeae 

ORF121 shows 97.4% identity over a 156 aa overlap with a predicted ORF (ORF121ng) from 
N. gonorrhoeae: 

orf 121 .pep MYRRKGRGIKPWMGAGXAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 60 

MMIMIIMIMM III Mil ICIMI 111111111111 I I Mil I MINIM II 
orfl21ng MYRRKGRG I K PWMGAGAAFAALVW LVYALGDT LTP FAVAAVLAYVLD P LVEWLQKKGLNR 60 

orfl21 pep ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLiCNTIGGYV 120 

I t M I I M 1 1 M 1 1 1 1 I 1 1 I I I II I M M I M 1 1 1 I I 1 1 I 1 1 1 1 1 1 1 1 I I I 1 1 1 1 1 1 I 1 1 
orfl21ng ASASMSVMVFSLILLIJUiLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGyV 120 

orf 121. pep E I DQAS I IAWLQAHTGELSNALKAWFPVLMRQGGN I 156 I 

1 I I I I M I I I : I I I I II I I I I I I I II I I II : I I I I I I 
orfl21ng EIDQASIIAWFQAHTGELSNALKAWFPVLMKQGGNIVSTIGNLLLPPLLLYYFLLDWHRW 180 I 

An ORF121ng nucleotide sequence <SEQ ID 787> was predicted to encode a protein having amino 
acid sequence <SEQ ID 788>: 

1 MYRRKGRG IK PWMGAGAAFA ALVWLVYALG DT LTPFAVAA VLAYVLDPLV 

51 EWLQKKGLNR ASASMS VMVF SLILLLALLL IIV PMLVGQF NNLASRLPQL 

101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW FQAHTGELSN ALKAWFPVLM 

151 KQGGNIVS TI GNLLLPPLLL YYFLL DWHRW SCGIPKLVPR RFAGAYTRIT 

201 GNLNKVWGKF LRGQLLGETE RGAWCRVGR ECWEGGGARS RPSDDGWPRW 

251 GGG* 

Further work revealed the following gonoccocal DNA sequence <SEQ ID 789>: 

1 ATGTATCGGA GAAAAGGACG GGGCATCAAG CCGTGGATGG GTGCCGGCGC 

51 GGCGTTTGCC GCCTTGGTCT GGCTGGTTTA CGCGCTCGGC GATACTTTGA 

101 CTCCGTTTGC GGTTGCGGCG GTGCTGGCGT ATGTGTTGGA CCCTTTGGTC 

151 GAATGGTTGC AGAAAAAGGG TTTGAACCGT GCATCCGCTT CGATGTCTGT 

201 GATGGTGTTT TCCTTGATTT TGTTGTTGGC ATTATTGTTG ATTATTGTCC 

251 CTATGCTGGT CGGGCAGTTC AATAATTTGG CATCTCGCCT GCCCCAATTA 

301 ATCGGTTTTA TGCAGAACAC GCTGCTGCCG TGGTTGAAAA ATACAATCGG 

351 CGGATATGTG GAAATCGATC AGGCATCTAT TATTGCGTGG TTTCAGGCGC 

401 ATACGGGCGA GTTGAGCAAC GCGCTTAAGG CGTGGTTTCC CGTTTTGATG 

451 AAACAGGGCG GCAATATTGT CAGCAGTATC GGCAACCTGC TGCTGCCGCC 

501 CTTGCTGCTT TACTATTTCC TGCTGGATTG GCAGCGGTGG TCGTGCGGCA 

551 TCGCCAAACT GGTTCCGAGG CGTTTTGCCG GTGCTTATAC GCGCATTACG 

601 GGTAATTTGA ACGAGGTATT GGGCGAATTT TTGCGCGGTC AGCTTCTGGT 

651 GATGCTGATT ATGGGCTTGG TTTACGGTTT GGGATTGATG CTAGTCGGAC 

701 TGGATTCGGG ATTTGCCATC GGTATGGTTG CCGGTATTTT GGTGTTTGTC 

751 CCCTATTTGG GTGCGTTTAC GGGATTGCTG CTTGCCACTG TTGCAGCCTT 

801 GCTCCAGTTC GGTTCGTGGA ACGGAATCTT GGCTGTTTGG GCGGTTTTTG 

851 CCGTCGGTCA GTTTCTCGAA AGTTTTTTCA TTACGCCGAA AATTGTAGGA 

901 GACCGTATCG GCCTGTCGCC GTTTTGGGTT ATCTTTTCGC TGATGGCGTT 

951 CGGAGAGCTG ATGGGCTTTG TCGGAATGTT GGCCGGATTG CCTTTGGCCG 

1001 CCGTAACCTT GGTCTTGCTT CGCGAGGGCG CGCAGAAATA TTTTGCCGGC 

1051 AGTTTTTACC GGGGCAGGTA G 
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This corresponds to the amino acid sequence <SEQ ID 790; ORF121ng-l>: 

1 MYRRKGRGIK PWMGAGAAFA ALVWLVYALG DTL TPFAVAA VLAYVLDPLV 

51 EWLQKKGLNR ASASMS VMVF SLILLLALLL IIV PMLVGQF NNLASRLPQL 

101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW FQAHTGELSN ALKAWFPVLM 

151 KQGGNIVS SI GNLLLPPLLL YYFLL DWQRW SCGIAKLVPR RFAGAYTRIT 

201 GNLNEVLGEF LRGQL LVMLI MGLVYGLGLM LV GLDSGFAI GMVAGILVFV 

251 PYLGAFTGLL LA TVAALLQF GSWNG ILAVW AVFAVGQFLE SF FITPKIVG 

301 DRIGLSPFWV IFSLMAFGEL MG FVGMLAGL PLAAVTLVLL REGAQKYFAG 

351 SFYRGR* 



10 ORF121ng-l and ORF121-1 show 97.5% identity in 356 aa overlap: 
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orf 121-1. pep 
orfl21ng-l 



orf 121-1. pep 
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orf 12 1-1. pep 
orfl21ng-l 



orfl21-l.pep 
orfl21ng-l 



orf 121-1. pep 
orfl21ng-l 



orf 121-1. pep 
orfl21ng-l 



10 20 30 40 50 60 

MYRRKGRG I K PWMGAGAAFAALVWL VFALG DTLT P FAVAAVLAYVLDP LVEWLQKKGLNR 

II I III INI II II Ml I II! I II I 1:1 Ml II Mil MINIMI III Ml I Ml II II 
MYRRKGRGIKPWMGAGAAFAALWLVYAI/3DTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 

10 20 30 40 50 60 

70 80 90 100 110 120 

ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGET1QNTLLPWLKNTIGGYV 

I I I I M 1 1 I I 1 1 M i I I 1 1 I i K M I I I i 1 1 I 1 1 I I tl 1 1 I ! I I I I I I I 1 M 1 1 1 1 1 1 1 I t 
ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 

70 80 90 100 110 120 

130 140 150 160 170 180 

E I DQAS 1 1 AWLQAHTGELSNALKAW FPVLMRQGGN I VS S IGNLLLLPLLLYYFLLDWQRW 
I M II I I II I M II II I II I I II M II I 11 : II I II I I II II I M t i I I I I I I I I I I I I 
EIDQASIIAWFQAHTGELSNALKAWFPVLMKQGGNIVSSIGNLLLPPLLLYYFLLDWQRW 

130 140 150 160 170 180 

190 200 210 220 230 240 

SCGIAKLVPRRFAGAYTRITGNLNEVLGEFLRGQLLVMLIMGLVYGLGLVLVGLDSGFAI 
M I 1 1 I II II II II II II II I I M II I II II M I II M I M M M II II M I I II II I I I 
SCGIAKLVPRRFAGAYTRITGNLNEVLGEFLRGQLLVMLIMGLVYGLGLMLVGLDSGFAI 

190 200 210 220 230 240 

250 260 270 280 290 300 

GMLAGILVFVPYLGAFTGLLLATVAALLQFGSWNGILSVWAVFAVGQFLESFFITPKIVG 
I Ml I M M II II II I M II I I II I II I I I II II M I : I II II M II II II I I M II I M 
GMVAGILVFVPYLGAFTGLLLATVAALLQFGSWNGILAVWAVFAVGQFLESFFITPKIVG 

250 260 270 280 290 300 

310 320 330 340 350 

DRIGLSPFWVIFSLMArcQLMGFVGMLAGLPLAAVTLVLLREGVQKYFAGSFYRGRX 
I 1 1 1 1 M II II 1 1 II II M M M II M II I I I M II I I I M 1 1 : 1 I M II II M I M 

DRIGLSPFWVIFSLMAFGELMGFVGMLAGLPLAAVTLVLLREGAQKYFAGSFYRGRX 
310 320 330 340 350 
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60 



65 



In addition, ORF121ng-l shows homology to a permease from H.influenzae: 

sp|P43969|PERM_HAEIN POTATIVE PERMEASE PERM HOMOLOG Length = 349 
Score = 69.9 bits (168) , Expect = 2e-ll 

Identities « 67/317 (21%) f Positives = 120/317 (37%), Gaps « 7/317 (2%) 

Query: 26 VYALGDTLTPFAVAAVLAYVLDPLVEWL-QKKGLNRASASMSVMVFSXXXXXXXXXXXVP 84 

+Y GD + P +A VL+Y+L+ + +L Q R A++ + VP 

Sbjct: 32 IYFFGDLIAPLLIALVLSYLLEIPINFLNQYLKCPRMLATILIFGSFIGLAAVFFLVLVP 91 

Query: 85 MLVGQFNNLASRLPQLIGFMQNTLLPWLKNT IGGYVE- 1 DQAS IIAWFQAHTGELSNALK 143 
ML Q +LSLP+ N WLN YEID + + + F+ ++ + 

Sbjct: 92 MLWNQTISLLSDLPAMF NKSNEWLLNLPKN YPELI DYSMVDSI FNS VREKILGFGE 147 

Query: 144 AWFPVLMKQGGNIVSSIGNXXXXXXXXXXXXXDWQRWSCGIAKLVPRRFAGAYTRITGWL 203 

+ + + N+VS D G+++ +P+ A+ R + 

Sbjct: 148 SAVlCLSLASIMNLVSLGIYAFLVPLMMFFMIiCDKSELLQGVSRETjPKNRNLAFXRWK-EM 206 

Query: 204 NEVXGEFLRGQXXXXXXXXXXXXXXXXXXXXDSGFAIGMVAGILVF7PYXXXXXXXXXXX 263 

+ + ++ G+ + + G+ V VPY 

Sbjct: 207 QQQISNYIHGKLLEILIVTLITYIIFLIFGLNYPLLLAFAVGLSVLVPYIGAVIVTIPVA 266 
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Query 264 XXXXXQFGSWNGILAVWAVFAVGQFLESFFITPKIVGDRIGLSPFWVIFSLMAFGELMGF 323 

QFG + FAV QL+ +P+ + + L P +1 S++ FG L GF 

Sbjct: 267 LVALFQFGI SPTFWYI I IAFAVSQLLDGNLLVPYLFSEAVNLHPLI 1 1 ISVLI FGGLWGF 326 

Query: 324 VGMLAGL PLAAVTLVLL 340 

G+ +PLA + ++ 
Sbjct: 327 WGVFFAI PLATLVKAVI 343 

Based on this analysis, including the presence of a putative leader sequence and transmembrane 
domains in the two proteins, it is predicted that the proteins from N.meningitidis and 
N.gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

Example 94 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 791>: 

1 . . ACTGCTTTTT CGGCGGCGCT GCGCTTGAGT CCATCATGAC TCGTCATATT 

51 TTTGTCCTTT GGGAAACCGT ATCAACAAAC AGCCGCCATC TTAACATTTT 

101 TTTGCACGTC CTGCCCGCCG CGTTCAAATG CGTACCAGCA ATACCGCCGC 

151 CTGCGCCTCT ATGCCTTCCA TCCGCCCGAG ATAGCCGAGT TTTTCGTTGG 

201 TTTTGCCTTT GATGTTGACG CACGAAATGT CTATGCCCAA ATCGGCGGCG 

251 ATGTTGGCAC GCATTTGCGG AATGTGCGGC GCGAGTGTGG GTTTCTGTGC 

301 AATCACGGTC GTATCGACAT TGACCGCCTG CCAACCCTGC GCCTGAACGC 

351 TTTGATACGC CGCACGCAAA AGGACGCGGC TGTCCGCATC TTTGAACTCT j 

401 GCGGCGGTGT CGGGGAAATG GCTGCCGATA TCGCCCAAAC CTGCCGCACC ' 

451 GAGCAGCGCG TCGGTAACGG CGTGCAGCAG CGCATCGGCA TCGGAGTGTC j 

501 CGAGCAGCCC TTTTTCAAAT GGGATTTCAA CTCCGCCAAG TATCAG. . 1 

This corresponds to the amino acid sequence <SEQ ID 792; ORF122>: 

1 . . TAFSAALRLS PSXLVIFLSF GKPYQQTAAI LTFFCTSCPP RSNAYQQYRR 

51 LRLYAFHPPE IAEFFVGFAF DVDARNVYAQ IGGDVGTHLR NVRRECGFLC 

101 NHGRIDIDRL PTLRLNALIR RTQKDAAVRI FELCGGVGEM AADIAQTCRT 

151 EQRVGNGVQQ RIGIGVSEQP FFKWDFNSAK YQ.. 

Further work revealed the complete nucleotide sequence <SEQ ID 793>: 

1 ATATCGTACT GGGCAAGCAG TTCGCCGGAT TTTTTGGAAG TAGATACCGC 

51 GCCTTTGATT TTTTTGCCGC TCTTACCCAA GGCTTCGATG AAAAAGTTGA 

101 TGGTCGAGCC GGTACCGATG CCGATATATT CATTTTCGGG TACGAATTCG 

151 ACTGCTTTTT CGGCGGCGAT GCGCTTGAGT TCGTCTTGTG TCGTCATATT 

201 TTTGTCCTTT GGGAAACCGT ATCAACAAAC AGCCGCCATC TTAACATTTT 

251 TTTGCACGTC CTGCCCGCCG CGTTCAAATG CGTACCAGCA ATACCGCCGC 

301 CTGCGCCTCT ATGCCTTCCA TCCGCCCGAG ATAGCCGAGT TTTTCGTTGG 

351 TTTTGCCTTT GATGTTGACG CACGAAATGT CTATGCCCAA ATCGGCGGCG 

401 ATGTTGGCAC GCATTTGCGG AATGTGCGGC GCGAGTTTGG GTTTCTGTGC 

451 AATCACGGTC GTATCGACAT TGACCGCCTG CCAACCCTGC GCCTGAACGC 

501 TTTGATACGC CGCACGCAAA AGGACGCGGC TGTCCGCATC TTTGAACTCT 

551 GCGGCGGTGT CGGGGAAATG GCTGCCGATA TCGCCCAAAC CTGCCGCACC 

601 GAGCAGCGCG TCGGTAACGG CGTGCAGCAG CGCATCGGCA TCGGAGTGTC 

651 CGAGCAGCCC TTTTTCAAAT GGGATTTCAA CTCCGCCAAG TATCAGCTTT 

701 CTGCCTTCGG TCAGTTGGTG GACATCGTAG CCCTGTCCGA TACGGATGTT 

751 CGTCATCGTT TGTGTTCCTG A 

This corresponds to the amino acid sequence <SEQ ID 794; ORF122-l>: 

1 ISYWASSSPD FLEVDTAPLI FLPLLPKASM KKLMVEPVPM PIYSFSGTNS 

51 T AFSAAMRLS SSCWIFL SF GKPYQQTAAI LTFFCTSCPP RSNAYQQYRR 

101 LRLYAFHPPE IAEFFVGFAF DVDARNVYAQ IGGDVGTHLR NVRREFGFLC 

151 NHGRIDIDRL PTLRLNALIR RTQKDAAVRI FELCGGVGEM AADIAQTCRT 

201 EQRVGNGVQQ RIGIGVSEQP FFKWDFNSAK YQLSAFGQLV DIVALSDTDV 

251 RHRLCS* 

Computer analysis of this amino acid sequence gave the following results: 
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Homology with a predicted ORF from N.meninzitidis (strain A) 

ORF122 shows 94.0% identity over a 1 82aa overlap with an ORF (ORF122a) from strain A of N. 
meningitidis: 



10 20 30 

orf 122 - pep TAFS AALRLS PSXLVI FLS FGKPYQQTAAI 

I I I I I I : I I I I : I I I I I I I I I I M I I I ! 
orf 122a FLPLLPKASMKKLMVEPVPMPMYS FSGTNSTAFSAAMRLS SSCWI FLS FGKPYQQTAAI 

30 40 50 60 70 80 

40 50 60 70 80 90 

orf 122 . pep LT FFCT SC PPRSNAYQQYRRLRLYAFH P PE I AE FFVG FAFDVDARNVYAQIGGDVGTHLR 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I j I I I I I | | | | | | | | | 
orf 122a LT FFXT S C PPRSN P YQQYRRLRLYAFHAPE I TE FFVGFAFXVDARNVYAQIGGDVGTHLR 

90 100 110 120 130 140 

100 110 120 130 140 150 

orf 122 . pep NVRRECGFLCNHGRIDIDRLPTLRLNALIRRTQKDAAVRIFELCGGVGEMAADIAQTCRT 

l:HI llllllllllll II I Mill I II MUM II II M III I M II I I till 

or f 1 2 2 a NMRRE FGFLCNHGRI DI DRLPTLRLNALIRRTQKDAAVRI FELCGGVGEMAADIAQTCRT 

150 160 170 180 190 200 

160 170 180 

orf 122 . pep EQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQ 
. I ! { M I I I I I I I I M I I I I I M I I I I II I II I 
orf 122a EQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQLSAFGQLVDIVALSDTDVRHRLCSX 

210 220 230 240 250 

The complete length ORF 122a nucleotide sequence <SEQ ID 795> is: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 



ATATCATATT 
GCCTTTGATT 
TGGTCGAACC 
ACTGCNTTTT 
TTTGTCCTTT 
TTNNNACGTC 
CTGCGACTCT 
TTTTGCCTTT 
ATGTTGGCAC 
AATCACGGTC 
TTTGATACGC 
GCGGCGGTGT 
GAGCAGCGCG 
CGAGCAGCCC 
CTGCCTTCGG 
CGTCATCGTT 



GGGCAAGCAG 
TTTTTGCCGC 
GGTACCGATG 
CGGCGGCGAT 
GGGAAACCGT 
CTGCCCGCCG 
ATGCCTTCCA 
GANGTTGACG 
GCATTTGCGG 
GTATCGACAT 
CGCACGCAAA 
CGGGGAAATG 
TGGGTAACGG 
TTTTTCAAAT 
TCAGTTGGTG 
TGTGTTCCTG 



TTCACTGGAT 
TCTTACCCAA 
CCGATGTATT 
GCGCTTGAGT 
ATCAACAAAC 
CGTTCAAATC 
TGCGCCCGAG 
CACGAAATGT 
AATATGCGGC 
TGACCGCCTG 
AGGACGCGGC 
GCTGCCGATA 
CGTGCAGCAG 
GGGATTTCAA 
GACATCGTAG 
A 



TTTTTGGAAG 
GGCTTCGATG 
CGTTTTCGGG 
TCGTCTTGTG 
AGCCGCCATC 
CTTACCAGCA 
ATAACCGAGT 
CTATGCCCAA 
GCGAGTTTGG 
CCAACCCTGC 
TGTCCGCATC 
TCGCCCAAAC 
CGCATCGGCA 
CTCCGCCAAG 
CCCTGTCCGA 



TAGATACCGC 
AAAAAGTTGA 
TACGAATTCG 
TCGTCATATT 
TTAACATTTT 
ATACCGCCGC 
TTTTCGTTGG 
ATCGGCGGCG 
GTTTCTGTGC 
GCCTGAACGC 
TTTGAACTCT 
CTGCCGCACC 
TCGGAGTGTC 
TATCAGCTTT 
TACGGATGTT 



This encodes a protein having amino acid sequence <SEQ ID 796>: 



1 ISYWASSSLD 
51 TAFSAAMRLS 



FLEVDTAPLI 
SSCWIFLSF 



101 LRLYAFHAPE 

151 NHGRIDIDRL 

201 EQRVGNGVQQ 

251 RHRLCS* 



ITEFFVGFAF 
PTLRLNALIR 
RIGIGVSEQP 



FLPLLPKASM 
GKPYQQTAAI 
XVDARNVYAQ 
RTQKDAAVRI 
FFKWDFNSAK 



KKLMVEPVPM 
LTFFXTSCPP 
IGGDVGTHLR 
FELCGGVGEM 
YQLSAFGQLV 



PMYSFSGTNS 
RSNPYQQYRR 
NMRRE FGFLC 
AADIAQTCRT 
DIVALSDTDV 



ORF122a and ORF122-1 show 96.9% identity in 256 aa overlap: 



10 20 30 40 50 60 

orf 122a . pep I S YWAS S SLDFLEVDTAPLI FLPLLPKASMKKLMVEPVPMPMYS FSGTNSTAFSAAMRLS 

nimn mm im ii i mmmm im ii i m mm mmm i m i m mi m mm 

orf 122-1 I SYWASSSPDFLEVDTAPLIFLPLLPKASMKKLMVEPVPMPIYS FSGTNSTAFSAAMRLS 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 122a . pep SSCWIFLSFGKPYQQTAAILTFFXTSCPPRSNPYQQYRRLRLYAFHAPEITEFFVGFAF 
M M I M M I I M M I 1 1 I I 11 1 I MINIM I I 1 i I f 1 I 1 1 I I I MhlllllMI 
orf 122-1 SSCWIFLSFGKPYQQTAAILTFFCTSCPPRSNAYQQYRRLRLYAFHPPEIAEFFVGFAF 
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70 



80 



90 



100 



110 



120 



130 140 150 160 170 180 

orf 122a . pep XVDARNVYAQIGGDVGTHLRNMRREFGFLCNHGRIDIDRLPTLRLNALIRRTQKDAAVRI 
Mill II III Ml II llilf:IIM! I I MM II I I lllltl IIIIIIIIMIIIMM 
orf 122-1 DVDARNVYAQIGGDVGTHIJWVRREFGFLCNHGRIDIDRLPTLRLNALIRRTQKDAAVRI 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 122a . pep FELCGGVGEMAADIAQTCRTEQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQLSAFGQLV 
II Ml Ml IMIIMI III Ml I III I II MM I II Ml I III il MMIM II Mill I 
orf 122-1 FELCGGVGEMAADIAQTCRTEQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQLSAFGQLV 

190 200 210 220 230 240 

250 

orf 122a . pep DIVALSDTDVRHRLCSX 
lllllllllllllllll 
orf 122-1 DIVALSDTDVRHRLCSX 

250 



Homology with a predicted ORF from VI gonorrhoeae 

ORF122 shows 89.6% identity over a 182 aa overlap with a predicted ORF (ORF122ng) from 
N. gonorrhoeae: 

i 

orf 122. pep 
orfl22ng 
orf 122. pep 
orfl22ng 
orf 122. pep 
orfl22ng 
orfl22.pep 
orfl22ng 

The complete length ORF122ng nucleotide sequence <SEQ ID 797> is: 



TAFS AALRLSPSXLVI FLS FGKPYQQTAAI 3 0 
M I I M : M I I : M M M I I M M M M ' 
FLPLLPKASMKKLMVEPVPMPMYSFSGTNSTAFSAAMRLSSSCWI FLS FGKPYQQTAAI 80 

LTFFCTSCPPRSNAYQQYRRLRLYAFHPPEIAEFFVGFAFDVDARNVYAQIGGDVGTHLR 90 

I IN I I! Mill I I I M i M M I I I I I I I I I I 1 I i I M ! : I M I : : M M M M M I 

LT FFCT S W P PRSN P YQQ YRRLRL Y AFH P PE I AE FFVG FAFD I DARN I DT Q I GGDVGTHLR 140 

NVRRECGFLCNHGRIDI DRLPTLRLNALIRRTQKDAAVRI FELCGGVGEMAADIAQTCRT 150 
IN I IIIIIIIIIIM:IIIIIIIIIIIIIIIIIIIIIIIIIIIII:IIII:IIIIII 
NVRCE FGFLCNHGRIDIDHLPTLRLNALIRRTQKDAAVRI FELCGGVGKMAADVAQTCRT 200 

EQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQ 182 
llllllllllhll : lllllllllllllll 

EQRVGNGVQQRVGIRMPEQPFFKWDFNSAKYQLSAFGQLVDIVALSDTDIRHRLCS 256 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 



ATGTCGTACC 
GCCTTTGATT 
tgGTCGAACC 
ACTGCTTTTT 
TTTAtccttt 
TTTGCACGtC 
ctgcgcctCT 
TTTTGCCTTT 
ATGTTGGCAC 
AATCACGGTC 
TTTGATACGC 
GCGGCGGTGT 
GAGCAGCgcg 
CGAGCAGCCC 
CTGCCTTCGG 
CGTCATCGTT 



GGGCAAGCAG 
TTTTTACCGC 
GgtaCCGATG 
CGGCGGCGAT 
gGGAAaccct 
ctggccgccg 
AtgcCTTCCA 
GATatTGACG 
GCATTTGCGG 
GTATCGACAT 
CGCACGCAAA 
CGGGAAAATG 
tcggtaaCGG 
TTTTTCAAAT 
TCAATTGGTG 
TGTGTTCCTG 



TTCGCCGGAT 
TTTTGCCCAA 
CCGATGTATT 
GCGCttgAgt 
atcaAcaAAc 
cgttcaAATc 
TCCGCCCGAG 
CACGAAATAT 
AATGTGCGGT 
TGACCACCTG 
AGGACGCGGC 
GCTGCCGATG 
CGTGCAGCAG 
GGGATTTCAA 
GACATCGTAG 
A 



TTTTTGGAGG 
GGCTTCGATG 
CGTTTTCGGG 
TCgtcttgcg 
agccgccatC 
cgtaccaGca 
ATAGCCGAGT 
CGatacCCAa 
GCGAGTTTGG 
CCAACCCTGC 
TGTCCGCATC 
TCGCCCAAAC 
cgcgTcgGCA 
CTCCGCCAAG 
CCCTGTCCGA 



TTGAAACCGC 
AAGAAATTGa 
TACGAATTCG 
TcgTCATATT 
TTAACATTTT 
ataccgccgc 
TTTTCGTTGG 
atcggcgGCG 
GTTTCTGTGC 
GCCTGAACGC 
TTTGAACTCT 
CTGCCGCACC 
TCCGAATGCC 
TATCAGCTTT 
TACGGATATT 



This encodes a protein having amino acid sequence <SEQ ID 798>: 



1 MSYRASSSPD 

51 T AFSAAMRLS 

101 LRLYAFHPPE 

151 NHGRIDIDHL 

201 EQRVGNGVQQ 

251 RHRLCS* 



FLEVETAPLI 
SSCWIFLSF 



IAEFFVGFAF 
PTLRLNALIR 
RVGIRMPEQP 



FLPLLPKASM 
GKPYQQTAAI 
DIDARNIDTQ 
RTQKDAAVRI 
FFKWDFNSAK 



KKLMVEPVPM 
LTFFCTSWPP 
I GGDVGTHLR 
FELCGGVGKM 
YQLSAFGQLV 



PMYSFSGTNS 
RSNPYQQYRR 
NVRCE FGFLC 
AADVAQTCRT 
DIVALSDTDI 
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ORF122ng and ORF122-1 show 92.6% identity in 256 aa overlap: 

10 20 30 40 50 60 

orf 122-1 . pep ISYWASSSPDFI^VDTAPLIFLPLLPKASMKKLMVEPVPMPIYSFSGTNSTAFSAAMRLS 
:M II I I IN 111:1 M III I I ill I INI ill II I Mil: Mill I 111 Mil III! I 
orfl22ng MSYRASSSPDFLEVETAPLIFLPLLPKASMKKLMVEPVPMPMYSFSGTNSTAFSAAMRLS 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 122-1 • pep SSCWIFLSFGKPYQQTAAILTFFCTSCPPRSNAYQQYRRLRLYAFHPPEIAEFFVGFAF 
M I M II MM I I I I I I II I II II I II Mill I I I I I I I I I I II II I I I I I I I I I I II 
orfl22ng SSCWIFLSFGKPYQQTAAILTFFCTSWPPRSNPYQQYRRLRLYAFHPPEIAEFFVGFAF 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 122-1 . pep DVDARNVYAQIGGDVGTHIJ^RREFGFLCNHGRIDIDRLPTLRLNALIRRTQKDAAVRI 
I Ml II: MM MM II Ml II I M II I M M II II : M M M II I I I II M II I I II 
orfl22ng DIDARNI DTQIGGDVGTHLRNVRCEFGFLCNHGRI DI DHLPTLRLNALIRRTQKDAAVRI 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 122-1 . pep FELCGGVGEMAADIAQTCRTEQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQLSAFGQLV 
MMMM:MM:IMiMMMIMMM:M : II I II II I I II M II II I I I I M 
orfl22ng FELCGGVGKMAADVAQTCRTEQRVGNGVQQRVGIRMPEQPFFKWDFNSAKYQLSAFGQLV 

190 200 210 220 230 240 

250 

orf 122-1 .pep DIVALSDTDVRHRLCSX 
MM MM 1:1 Mil I I 
orfl22ng DIVALSDTDIRHRLCSX 

250 

Based on this analysis, it is predicted that the proteins from N.meningitidis and K gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 95 



The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 799>: 

1 . . GCCGGCGCGA GTGCGAACAA CATTTCCGCG CGTTTTGCGG AAACACCCGT 

51 CGCTGTCAGC GTTACCCTGA TCGGCACGGT ACTTGCCGTC ATGCTGCCCG 

101 TTACCGAATA TGAAAACTTC CTGCTGCTTA TCGGCTCGGT ATTTGCGCCG 

151 ATGgGGCGGA TTTTGATTGC CGACTTTTTC GTCTTGAAAC GGCGTGA 

This corresponds to the amino acid sequence <SEQ ID 800; ORF125>: 

1 . .AGAS ANN ISA RFAETPVAVS VTLIGTVLAV MLPVTEYENF LLLIGSVFAP 
51 MGGFDCRLFR LETA* 

Further work revealed the complete nucleotide sequence <SEQ ID 801>: 



1 ATGTCGGGCA ATGCCTCCTC TCCTTCATCT TCCTCCGCCA TCGGGCTGAT 

51 TTGGTTCGGC GCGGCGGTAT CGATTGCCGA AATCAGCACG GGTACGCTGC 

101 TTGCGCCTTT GGGCTGGCAG CGCGGTCTGG CGGCTCTACT TTTGGGTCAT 

151 GCCGTCGGCG GCGCGCTGTT TTTTGCGGCG GCGTATATCG GCGCACTGAC 

201 CGGACGCAGC TCGATGGAAA GCGTGCGCCT GTCGTTCGGC AAACGCGGTT 

251 CAGTGCTGTT TTCCGTGGCG AATATGCTGC AACTGGCCGG CTGGACGGCG 

301 GTGATGATTT ACGCCGGCGC AACGGTCAGC TCCGCTTTGG GCAAAGTGTT 

351 GTGGGACGGC GAATCTTTTG TCTGGTGGGC ATTGGCAAAC GGCGCGCTGA 

401 TTGTGCTGTG GCTGGTTTTC GGCGCACGCA AAACAGGCGG GCTGAAAACC 

451 GTTTCGATGC TGCTGATGCT GTTGGCGGTT CTGTGGCTGA GTGCCGAAGT 

501 CTTTTCCACG GCAGGCAGCA CCGCCGCACA GGTTTCAGAC GGCATGAGTT 

551 TCGGAACGGC AGTCGAGCTG TCCGCCGTGA TGCCGCTTTC CTGGCTGCCG 

601 CTTGCCGCCG ACTACACGCG CCACGCGCGC CGCCCGTTTG CGGCAACCCT 

651 GACGGCAACG CTCGCCTACA CGCTGACCGG CTGCTGGATG TATGCCTTGG 

701 GTTTGGCAGC GGCGTTGTTC ACCGGAGAAA CCGACGTGGC AAAAATCCTG 

751 CTGGGCGCAG GTTTGGGTGC GGCAGGCATT TTGGCGGTCG TCCTCTCCAC 
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801 CGTTACCACA ACGTTTCTCG ATGCCTATTC CGCCGGCGCG AGTGCGAACA 

851 ACATTTCCGC GCGTTTTGCG GAAACACCCG TCGCTGTCGG CGTTACCCTG 

901 ATCGGCACGG TACTTGCCGT CATGCTGCCC GTTACCGAAT ATGAAAACTT 

951 CCTGCTGCTT ATCGGCTCGG TATTTGCGCC GATGGCGGCG GTTTTGATTG 

1001 CCGACTTTTT CGTCTTGAAA CGGCGTGAGG AGATTGAAGG CTTTGACTTT 

1051 GCCGGACTGG TTCTGTGGCT TGCGGGCTTC ATCCTCTACC GCTTCCTGCT 

1101 CTCGTCCGGC TGGGAAAGCA GCATCGGTCT GACCGCCCCC GTAATGTCTG 

1151 CCGTTGCCAT TGCCACCGTA TCGGTACGCC TTTTCTTTAA AAAAACCCAA 

1201 TCTTTACAAA GGAACCCGTC ATGA 

This corresponds to the amino acid sequence <SEQ ID 802; ORF 125- 1>: 



1 MSGNASSPSS SSAIGLIWFG AAVSIAEIST GTLLAPLGWQ RGLAALLLGH 
51 AVGGALFFAA AYIGALTGRS SMESVRLSFG KRGSVLFSVA NMLQLAGWTA 
101 VMIYAGATVS SALGKVLWDG ES FVWWALAN GALIVLWLV F GARKTGGLKT 
151 VS MLLMLLAV LWLSAEVF ST AGSTAAQVSD GMSFGTAVEL SAVMPLSWLP 
201 LAADYTRHAR RPFAATLTAT LAYTLTGCWM YALGLAAALF TGETPVAKIL 
251 LGAGLGAAGI LAWL STVTT TFLDAYSAGA SANNISARFA E TPVAVGVTL 
301 IGTVLAVM LP VTEYEN FLLL IGSVFAPMAA VLIA DFFVLK RREEIEGFDF 
351 AGLVLWLAGF ILYRFLL SSG WESSIGLT AP VMSAVAIATV SVRLFFKKTQ 
401 SLQRNPS* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

i 

ORF125 shows 76.5% identity over a 51aa overlap with an ORF (ORF125a) from strain A of N. 
meningitidis: 



10 20 30 

orf 125 . pep AGASANNI SARFAETPVAVSVTLIGTVLAV 

1 j : | | | | 1 M : : : | |:||:|:::J|:||| 
orf 125a KILLGAGLGAAGILAVVLSTVTTTFLDAYSAGVSANNI SAKLSE I P IAVAVAWGTLLAV 

250 260 270 280 290 300 



40 50 60 

orf 125 . pep MLPVTEYENFLLLIGSVFAPMGGFDCRLFRLETAX 

: MINI II MM II t II II I: 
orf 125a LLPVTEYENFLLLIGSVFAPMAAVLIADFFVLKRREEIEG 

310 320 330 340 

The ORF 125a partial nucleotide sequence <SEQ ID 803> is: 

1 ATGTCGGGCA ATGCCTCCTC TCNTTCATCT TCCGCCGCCA TCGGGCTGAT 

51 TTGGTTCGGC GCGGCGGTAT CGATTGCCGA AATCAGCACG GGTACACTGC 

101 TTGCGCCTTT GGGCTGGCAG CGCGGTCTGG CNGCTCTGCT TTTGGGTCAT 

151 GCCGTCGGCG GCGCGCTGTT TTTTGCGGCG GCGTATATCG GCGCACTGAC 

201 CGGACNCANC TCGATGGAAA GCGTGCGCCT GTCGTTCGGC AAACGCGGTT 

251 CAGTGCTGTT TTCCGTGGCG AATATGCTGC AACTGGCCGG CTGGACGGCG 

301 GTGATGATTT ACGCCGGCGC AACGGTCAGC TCCGCTTTGG GCAAAGTGTT 

351 GTGGGACGGC GAATCTTTTG TCTGGTGGGC ATTGGCAAAC GGCGCGCTGA 

401 TTGTGCTGTG GCTGGTTTTC GGCGCACGCA AAACAGGCGG GCTGAAAACC 

451 GTTTCGATGC TGCTGATGCT GTTGGCGGTT CTGTGGCTGA GTGCCGAANT 

501 NTTTTCCACG GCAGGCAGCA CCGCCGCANN GGTNNCAGAC GGCATGAGTT 

551 TCGGAACGGC AGTCGAGCTG TCCGCCGTNA TGCCGCTTTC TTGGCTGCCG 

601 CTGGCCGCCG ACTACACGCG CCACGCGCGC CGCCCGTTTG CGGCAACCCT 

651 GACGGCAACG CTCGCCTACA CGCTGACCGG CTGCTGGATG TATGCCTTGG 

701 GTTTGGCAGC GGCGTTGTTC ACCGGAGAAA CCGACGTGGC AAAAATCCTG 

751 CTGGGCGCAG GTTTGGGTGC GGCAGGCATT TTGGCGGTCG TCCTGTCGAC 

801 CGTTACCACC ACTTTTCTCG ATGCNTACTC CGCCGGCGTA AGTGCCAACA 

851 ATATTTCCGC CAAACTTTCG GAAATACCNA TCGCCGTTGC CGTCGCCGTT 

901 GTCGGCACAC TGCTTGCCGT CCTCCTGCCC GTTACCGAAT ATGAAAACTT 

951 CCTGCTGCTT ATCGGCTCGG TATTTGCGCC GATGGCGGCG GTTTTGATTG 

1001 CCGACTTTTT CGTCTTGAAA CGGCGTGAGG AGATTGAAGG C. . 

This encodes a protein having the partial amino acid sequence <SEQ ID 804>: 



1 MSGNASSXSS SAAIGLIWFG AAVSIAEIST GTLLAPLGWQ RGLAALLLGH 



51 AVGGA LFFAA AYIGALTGXX SMESVRLSFG KRGSVLFSVA NMLQLAGWTA 
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101 
151 
201 
251 
301 



VMIYAGATVS SALGKVLWDG ES FVWWALAN GALIVLWLV F 
VS MLLMLLAV LWLSAEXF ST AGSTAAXVXD GMSFGTAVEL 
LAADYTRHAR RPFAATLTAT LAYTLTGCWM YALGLAAALF 
LGAGLGAAGI LAWL STVTT TFLDAYSAGV SANNISAKLS 
VGTLLAVLLP VTEYENFLLL IGSVFAPMAA VLIADFFVLK 



GARKTGGLKT 
SAVMPLSWLP 
TGETDVAKIL 
E IPIAVAVAV 
RREEIEG. . 



ORF125a and ORF 125-1 show 94.5% identity in 347 aa overlap: 



10 20 30 40 50 60 

orf 125a . pep MSGNAS SXS S SAAIGLI WFGAAVS IAEI STGTLLAPLGWQRGLAALLLGHAVGGALFFAA 
II Mill I I 1:1 M I 11 II III I Ml I I M II Ml ! I I I I I I II I M I II I III I! I I t 
orf 125-1 MSGNASSPSSSSAIGLIWFGAAVSIAEISTGTLLAPLGWQRGLAALLLGHAVGGALFFAA 

10 20 30 40 50 60 

70 80 90 . 100 110 120 

or f 125a . pep AYIGALTGXXSMESVRLS FGKRGSVLFSVANMLQLAGWTAVMI YAGATVS SALGKVLWDG 

MINIM I I N N I 11 II II M I I II I I I I N N I I N I I I II i II I II I I 

orf 125-1 AYIGALTGRSSME SVRLS FGKRGSVLFSVANMLQLAGWTAVMI YAGATVS SALGKVLWDG 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 125a . pep ESFVWWALANGALIVLWLVFGARKTGGLKTVSMLLMLLAVLWLSAEXFSTAGSTAAXVXD 

I I I I M I I I I M I I I IF M I I i I I I I M I I t I 1 1 J I I I I I t I I I I I Ml I I 

orf 125-1 ES FVWWALANGALI VLWLVFGARKTGGLKTVSMLLMLLAVLWLSAEVFSTAGSTAAQVS D 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 125a . pep GMSFGTAVELSAVMPLSWLPLAADYTRHARRPFAATLTATLAYTLTGCWMYALGLAAALF 

I I I I II II II II 1 1 II 1 1 1 II I I II II I II M 1 1 II N II I II II 1 1 N 1 1 1 1 1 1 1 II N 
orf 125-1 GMSFGTAVELSAVMPLSWLPLAADYTRHARRPFAATLTATLAYTLTGCWMYALGLAAALF 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 125a . pep TGETDVAKI LLGAGLGAAGI LAWLSTVTTTFLDAYSAGVSANN I SAKLSEI PIAVAVAV 
I M II II M M II II I M I M I II M M II II M I I M I: I M I II I : ::| |:||:|:: 
orf 125-1 TGET DVAKI LLGAGLGAAG I LAWLSTVTTT FLDAYS AGASANN I S ARFAET PVAVGVTL 

250 260 270 280 290 300 

310 320 330 340 

orf 125a . pep VGTLLAVLLPVTEYENFLLLIGSVFAPMAAVLIADFFVLKRREEIEG 

:l MM IM MMM NIIMIMM MMMIMMIMN I II II 
orf 125-1 IGTVLAVMLPVTEYENFLLLIGSVFAPMAAVLIADFFVLKRREEIEGFDFAGLVLWLAGF 
310 320 330 340 350 360 

Homology with a predicted ORF from N gonorrhoeae 

ORF125 shows 86.2% identity over a 65aa overlap with a predicted ORF (ORF125ng) from 
N. gonorrhoeae: 



orf 125. pep 
orf 125ng 
orf 125. pep 
orfl25ng 



AGASANN I SARFAETPVAVSVTLIGTVLAV 30 
I I MM MM II II II I 1:1 IN INN 
KI LLGAGLG ITGI LAWLSTVTTT FLDTYS AGASANN I SARFAE I PVAVGVTLIRTVLAV 308 

MLPVTEYENFLLLIGSVFAPM-GGFDCRLFRLETA 64 
II II II I : II II I I 111:11 II II II II I : I I 
MLPVTEYKNFLLLIRSVFGPMAGGFDCRLFCLKTA 343 



An ORF125ng nucleotide sequence <SEQ ID 805> was predicted to encode a protein having amino 
acid sequence <SEQ ID 806>: 



i 

51 
101 
151 
201 
251 
301 



MSGNASSPSS SAAIGLVWFG AAVSIAEIST GTLLAPLGWQ RGLAALLLGH 



AVGGA LFFAA AYIGALTGRS 
VMIYVGATVS SALGKVLWDG 
VS MLLMLLAV LWLSVEVFA S 
PLAADYTRQA RRPFAATLTA 
LLGAGLGITG ILAWL STVT 
LIRTVLAVML PVTEYKNFLL 



SMESVRLSFG KCGSVLFSVA 
ES FVWWALAN GALIVLWLV F 
SGTNAAPAVS DGMTFGTAVE 
TLAYTLTGCW MYALGLAAAL 
TTFLDTYSAG ASANNISARF 
LIRSVFGPMA GGFDCRLFCL 



NMLQLAGWTA 
GARRTGGLKT 
LSAVMPLSWL 
FTGETDVAKI 
AEIPVAVGVT 
KTA* 
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Further work revealed the following gonococcal DNA sequence <SEQ ID 807>: 

1 ATGTCGGGCA ATGCCTCCTC TCCTTCATCT TCCGCCGCCA TCGGGCTGGT 

51 TTGGTTCGGC GCGGCGGTAT CGATTGCCGA AATCAGCACG GGTACGCTGC 

101 TCGCCCCCTT GGGCTGGCAG CGCGGTCTGG CGGCCCTGCT TTTGGGTCAT 

151 GCCGTCGGCG GCGCGCTGTT TTTTGCGGCG GCGTATATCG GCGCACTGAC 

201 CGGACGCAGC TCGATGGAAA GTGTGCGCCT GTCGTTCGGC AAATGCGGTT 

251 CAGTGCTGTT TTCCGTGGCG AATATGCTGC AACTGGCCGG CTGGACGGCG 

301 GTGATGATTT ACGTCGGCGC AACGGTCAGC TCCGCTTTGG GCAAAGTGTT 

351 GTGGGACGGC GAATCCTTTG TCTGGTGGGC ATTGGCAAAC GGCGCACTGA 

401 TCGTGCTGTG GCTGGTTTTC GGCGCACGCA GAACGGGCGG GCTGAAAACC 

451 GTTTCGATGC TGCTGATGCT GCTTGCCGTG TTGTGGTTGA GCGTCGAAGT 

501 GTTCGCTTCG TCCGGCACAA ACGCCGCGCC CGCCGTTTCA GACGGCATGA 

551 CCTTCGGAAC GGCAGTCGAA CTGTCCGCCG TCATGCCGCT TTCCTGGCTG 

601 CCGCTGGCCG CCGACTACAC GCGCCAAGCA CGCCGCCCGT TTGCGGCAAC 

651 CCTGACGGCA ACGCTCGCCT ATACGCTGAC GGGCTGCTGG ATGTATGCCT 

701 TGGGTTTGGC GGCGGCTCTG TTTACCGGAG AAACCGACGT GGCGAAAATC 

751 CTGTTGGGCG CGGGCTTGGG CATAACGGGC ATTCTGGCAG TCGTCCTCTC 

801 CACCGTTACC ACAACGTTTC TCGATACCTA TTCCGCCGGC GCGAGTGCGA 

851 ACAACATTTC CGCGCGTTTT GCGGAAATAC CCGTCGCTGT CGGCGTTACC 

901 CTGATCGGCA CGGTGCTTGC CGTCATGCTG CCCGTTACCG AATATAAAAA 

951 CTTCCTGCTG CTTATCGGCT CGGTATTTGC GCCGATGGCG GCGGTTTTGA 

1001 TTGCCGACTT TTTCGTCTTA AAACGGCGTG AGGAGATTGA AGGCTTTGAC 

1051 TTTGCCGGAC TGGTTCTGTG GCTGGCAGGC TTCATCCTCT ACCGCTTCCT 

1101 GCTCTCGTCC GGTTGGGAAA GCAGCATCGG TCTGACCGCC CCCGTAATGT 

1151 CTGCCGTTGC CATTGCCACC GTATCGGTAC GCCTTTTCTT TAAAAAAACC 1 

1201 CAATCTTTAC AAAGGAACCC GTCATGA 

This corresponds to the amino acid sequence <SEQ ID 808; ORF125ng-l>: 

1 MSGNASSPSS SAAIGLVWFG AAVSIAEIST GTLLAPLGWQ RGLAALLLGH 

51 AVGGA LFFAA AYIGALTGRS SMESVRLSFG KCGSVLFSVA NMLQLAGWTA 

101 VMIYVGATVS SALGKVLWDG ES FVWWALAN GALIVLWLV F GARRTGGLKT 

151 VS MLLMLLAV LWLSVEVFA S SGTNAAPAVS DGMTFGTAVE LSAVMPLSWL 

201 PLAADYTRQA RRPFAATLTA TLAYTLTGCW MYALGLAAAL FTGETDVAKI 

251 LLGAGLGITG ILAWL STVT TTFLDTYSAG ASANNISARF AE IPVAVGVT 

301 LIGTVLAVM L PVTEYKNFLL LIGSVFAPMA . AVLIA DFFVL KRREEIEGFD 

351 FAGLVLWLAG FILYRFLL SS GWESSIGLTA PVMSAVAIAT VSVRLFF KKT 

401 QSLQRNPS* 

ORF125ng-l and ORF125-1 show 95.1% identity in 408 aa overlap: 

10 20 30 40 50 60 

orf 12 5-1. pep MSGNASSPSSSSAIGLIWFGAAVSIAEISTGTLLAPLGWQRGLAALLLGHAVGGALFFAA 
Ml II lllltl:| I I 1:111 I! I III M IIMIMIMI Mllil! IMII III NMt I 
orfl25ng-l MSGNAS SPSS SAAIGLVWFGAAVS I AE I STGTLLAPLGWQRGLAALLLGHAVGGALFFAA 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 125-1 . pep AY I GALTGRS SME S VRLS FGKRG SVL FSVANMLQLAGWTAVMI YAGAT VS SALGKVLW DG 
I I I I I I 1 I 1 I I I I I I M I M I I I I I I I I I f I 1 I I I 1 1 I I I I 1 I r | 1 | | | | | | | | | I | | | 
orfl25ng-l AYIGALTGRS SME SVRLSFGKCG SVL FSVANMLQLAGWTAVMI YVGATVS SALGKVLWDG 

70 80 90 100 110 120 

130 140 150 160 170 179 

orf 125-1. pep ESFVWWALANGALIVLWLVFGARKTGGLKTVSMLLMLLAVLWLSAEVFSTAGSTAAQ-VS 
I I I I M I I I I I I I I I I II I I I I I : I II I I I I I I I I I I I I I M M : I I I ::: I :: M II 
orfl25ng-l E S FVWWALANGAL I VLWLV FGARRTGGLKT VSMLLMLLAVLWLS VEVFAS SGTNAAP AV S 

130 140 150 160 170 180 

180 190 200 210 220 230 239 

or f 12 5- 1 . pep DGMS FGTAVELSAVMPLSWLPLAADYTRHARRP FAATLTATLAYT LTGCWMYALGLAAAL 
I i 1 : I I I I I I I I I I I I I M I I I I I I 1 I I : I I I I I I I M I I I M I ! M M I I I 1 I I 1 I I II 
orfl25ng-l DGMT FGTAVE LSAVMPLSWL PLAADYTRQARRPFAATLTATLAYTLTGCWMYALGLAAAL 

190 200 210 220 230 240 

240 250 260 270 280 290 299 

orf 125-1. pep FTGETDVAKI LLGAGLGAAG I LAWLSTVTTT FLDAYSAGAS ANNI SARFAET PVAVGVT 
IfllllillMMMM :I!IIMM i I I I I I I i : I I I I I ! I I M 1 I I I I I MINI! 
orfl25ng-l FTGET DVAKI LLGAGLG ITGI LAWLSTVTTTFLDT YS AGASANN I SARFAE I PVAVGVT 

250 260 270 280 290 300 
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300 310 320 330 340 350 359 

or f 125-1 . pep LIGTVLAVMLPVTEYENFLLLIGSVFAPMAAVLIADFFVLKRREEIEGFDFAGLVLWLAG 
Ml Ml III Mill 1:11 II I MMMI Mill II III li lllll II III HUM III! 
orfl25ng-l LIGTVLAVMLPVTEYKNFLLLIGSVFAPMAAVLIADFEVIJCRKEEIEGFDFAGLVLWLAG 

310 320 330 340 350 360 

360 370 380 390 400 

or f 125-1 . pep FI LYRFLLS SGWE S S I GLT AP VMS AVAI ATVS VRLFFKKTQS LQRN PSX 
IIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 
orfl25ng-l FILYRFLLSSGWESSIGLTAPVMSAVAIATVSVRLFFKKTQSLQRNPSX 

370 380 390 400 

Based on this analysis, including the presence of putative leader sequence and transmembrane 
domains in the gonococcal protein, it is predicted that the proteins from Kmeningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

Example96 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ED 809>: 

1 ATGACCCGTA TCGCCATCCT CGGCGGCGGC CTCTCGGGAA GGCTGACCGC 

51 GTTGCAGCTT GCAGAACAAG GTTATCAGAT TGCACTTTTC GATAAAAGCT 

101 GCCGCCGGGG CGAACACGCC GCCGCCTATG TAGCCGCCGC CATGCTCGCG 

151 CGTGCAGCGG A.ACGGTCGA AGCCACGCCC GAAGTGGTCA GGCTGGGCAG 

201 GCAGAGCATC CCGCTTTGGC GCGGCATCCG ATGCCGTCTG AACACGCACA 

251 CGATGATGCA GGAAAACGGC AGCCTGATTG TATGGCACGG GCAGGACAAG 

301 CCATTATCCA GCGAGTTCGT CCGCCATCTC AAACGCGGCG GCGT . ACGGA 

351 TGACGAAATC GTCCGTTGGC GCGCCGACGA CATCGCCGAA CGCGAACCGC 

401 AACTCGGCGG ACGTTTTTAA GACGGCATCT ACCTGCCGAC CGAAGC . CAG 

451 CTCGACGGGC GGCAATTATA GTCTGCACTT GCCGACGCTT TGGACGAACT 

501 GAACGTCCCC TGCCATTGGG AACACGAATG CGTCCCCGAA GCCTGCAAG . . 

This corresponds to the amino acid sequence <SEQ ID 810; ORF126>: 

1 MTRIAILGGG LSGRLTALQL AEQGYQIALF DKSCRRGEHA AAYVAAAMLA 

51 PAAXTVEATP EWRLGRQSI PLWRGIRCRL NTHTMMQENG SLIVWHGQDK 

101 PLSSEFVRHL KRGGXTDDEI VRWRADDIAE REPQLGGRFX DGIYLPTEXQ 

151 LDGRQLXSAL ADALDELNVP CHWEHECVPE ACK. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 81 1>: 

1 ATGACCCGTA TCGCCATCCT CGGCGGCGGC CTCTCGGGAA GGCTGACCGC 

51 GTTGCAGCTT GCAGAACAAG GTTATCAGAT TGCACTTTTC GATAAAGGCT 

101 GCCGCCGGGG CGAACACGCC GCCGCCTATG TTGCCGCCGC CATGCTCGCG 

151 CCTGCGGCGG AAGCGGTCGA AGCCACGCCC GAAGTGGTCA GGCTGGGCAG 

201 GCAGAGCATC CCGCTTTGGC GCGGCATCCG ATGCCGTCTG AACACGCACA 

251 CGATGATGCA GGAAAACGGC AGCCTGATTG TGTGGCACGG GCAGGACAAG 

301 CCATTATCCA GCGAGTTCGT CCGCCATCTC AAACGCGGCG GCGTAGCGGA 

351 TGACGAAATC GTCCGTTGGC GCGCCGACGA CATCGCCGAA CGCGAACCGC 

401 AACTCGGCGG ACGTTTTTCA GACGGCATCT ACCTGCCGAC CGAAGGCCAG 

451 CTCGACGGGC GGCAAATATT GTCTGCACTT GCCGACGCTT TGGACGAACT 

501 GAACGTCCCC TGCCATTGGG AACACGAATG CGTCCCCGAA GGCCTGCAAG 

551 CCCAATACGA CTGGCTGATC GACTGCCGCG GCTACGGCGC AAAAACCGCG 

601 TGGAACCAAT CCCCCGAGCA CACCAGCACC CTGCGCGGCA TACGCGGCGA 

651 AGTGGCGCGG GTTTACACAC CCGAAATCAC GCTCAACCGC CCCGTGCGTC 

701 TGCTCCATCC GCGTTATCCG CTCTACATCG CCCCGAAAGA AAACCACGTC 

751 TTCGTCATCG GCGCGACCCA AATCGAAAGC GAAAGCCAAG CCCCCGCCAG 

801 CGTGCGTTCA GGGTTGGAAC TCTTGTCCGC ACTCTATGCC ATCCACCCCG 

851 CCTTCGGCGA AGCCGACATC CTCGAAATCG CCACCGGCCT GCGCCCCACG 

901 CTCAACCACC ACAACCCCGA AATCCGTTAC AACCGCGCCC GACGCCTGAT 

951 TGAAATCAAC GGCCTTTTCC GCCACGGTTT CATGATCTCC CCCGCCGTAA 

1001 CCGCCGCCGC CGCCAGATTG GCAGTGGCAC TGTTTGACGG AAAAGACGCG 

1051 CCCGAACGCG ATAAAGAAAG CGGTTTGGCG TATATCCGAA GACAAGATTA 

1101 A 
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This corresponds to the amino acid sequence <SEQ ID 812; ORF126-l>: 

1 MTRIAILGGG LSGRLTALQL AEQGYQIALF DKGCRRGEHA AAYVAAAMLA 

51 PAAEAVEATP EWRLGRQS I PLWRGIRCRL NTHTMMQENG SLIVWHGQDK 

101 PLSSEFVRHL KRGGVADDEI VRWRADDIAE REPQLGGRFS DGIYLPTEGQ 

151 LDGRQILSAL ADALDELNVP CHWEHECVPE GLQAQYDWLI DCRGYGAKTA 

201 WNQSPEHTST LRGIRGEVAR VYTPEITLNR PVRLLHPRYP LYIAPKENHV 

251 FVIGATQIES ESQAPASVRS GLELLSALYA IHPAFGEADI LEIATGLRPT 

301 LNHHNPEIRY NRARRLIEIN GLFRHGF MIS PAVTAAAARL AVALF DGKDA 

351 PERDKESGLA YIRRQD* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF126 shows 90.0% identity over a 180aa overlap with an ORF (ORF126a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

or f 1 2 6 . pep MTRIAILGGGLSGRLTALQLAEQGYQIALFDKSCRRGEHAAAYVAAAMLAPAAXTVEATP 
I I I I I I M If I I I I I I I I ! I I I I M I M I I I I : I I I I I I I I I I I ! I I I I j I I I :||||| 
orfl26a MTRI AI LGGGL SGRLTALQLAEQGYQI AL FDKGCRRGEHAAAYVAAAMLAPAAEAVEAT P 

10 20 30 40 50 60 

70 80 90 100 110 ' 120 

orf 126 . pep EWRLGRQS I PLWRG IRCRLNTHTMMQENGSLI VWHGQDKPLS SE FVRHLKRGGXTDDEI 
I I I I I II I I I I I I I I II : I : I : I I I I I I i I ! I I I II I I I I : I I I I I I I I I I : I I I* 
O r f 1 2 6a E WRLGRQX I PLWRG IRCHLKTPAMMXENGSLI VWHGQDKPLSNEFVRHLKRGGVADDXI 

70 80 90 100 110 120 j 

130 140 150 160 170 180 j 

orf 12 6 . pep VRWRADDIAEREPQLGGRFXDGI YLPTEXQLDGRQLXSALADALDELNVPCHWEHECVPE 1 

I I I M I I I I I I I I I I I I II MM) III MINI: I I I I I I I I I I M I I I I I I I I : I I 
orf 126a VRWRADDIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECAPE 

130 140 150 160 170 180 

The complete length ORF126a nucleotide sequence <SEQ ID 813> is: 

1 ATGACCCGTA TCGCCATCCT CGGCGGCGGC CTCTCNGGAA GGCTGACCGC 

51 ACTGCAGCTT GCAGAACAAG GTTATCAGAT TGCACTTTTC GATAAAGGCT 

101 GCCGCCGGGG CGAACACGCC GCCGCCTATG TTGCCGCCGC CATGCTCGCG 

151 CCTGCGGCGG AAGCGGTCGA AGCCACGCCT GAAGTGGTCA GGCTGGGCAG 

201 GCAGANCATC CCGCTTTGGC GCGGCATCCG ATGCCATCTG AAAACGCCTG 

251 CCATGATGCA NGAAAACGGC AGCCTGATTG TGTGGCACGG GCAGGACAAA 

301 CCTTTATCCA ACGAGTTCGT CCGCCATCTC AAACGCGGCG GCGTAGCGGA 

351 TGACNAAATC GTCCGTTGGC GCGCCGACGA CATCGCCGAA CGCGAACCGC 

401 AACTCGGCGG ACGTTTTTCA GACGGCATCT ACCTGCCGAC CGAAGGCCAG 

451 CTCGACGGGC GGCAAATATT GTCTGCACTT GCCGACGCTT TGGACGAACT 

501 GAACGTCCCC TGCCATTGGG AACACGAATG TGCCCCCGAA GACTTGCAAG 

551 CCCAATACGA CTGGCTGATC GACTGCCGCG GCTACGGCGC AAAAACCGCG 

601 TGGAACCAAT CCCCCGANNA NACCAGCACC CTGCGCGGCA TACGCGGCGA 

651 AGTGGCGCGG GTTTACACAC CCGAAATCAC GCTCAACCGC CCCGTGCGCC 

701 TGCTACACCC GCGCTATCCG CTNTACATCG CCCCGAAAGA AAACCNCGTC 

751 TTCGTCATCG GCGCGACCCA AATCGAAAGC GAAAGCCAAG CACCTGCCAG 

801 CGTGCGTTCC GGGCTGGAAC TCTTATCCGC ACTCTATGCC GTCCACCCCG 

851 CCTTCGGCGA AGCCGACATC CTCGAAATCG CCACCGGCCT GCGCCCCACG 

901 CTCAATCACC ACAACCCCGA AATCCGTTAC AACCGCGCCC GACGCCTGAT 

951 TGAAATCAAC GGCCTTTTCC GCCACGGTTT CATGATCTCC CCCGCCGTAA 

1001 CCGCCGCCGC CGTCAGATTG GCAGTGGCAC TGTTTGACGG AAAAGANGCG 

1051 CCCGAACGCG ATGAAGAAAG CGGTTTGGCG TATATCCGAA GACAAGATTA 

1101 A 

This encodes a protein having amino acid sequence <SEQ ID 814>: 

1 MTRIAILGGG LSGRLTALQL AEQGYQIALF DKGCRRGEHA AAYVAAAMLA 

51 PAAEAVEATP EWRLGRQXI PLWRGIRCHL KTPAMMXENG SLIVWHGQDK 

101 PLSNEFVRHL KRGGVADDXI VRWRADDIAE REPQLGGRFS DGIYLPTEGQ 

151 LDGRQILSAL ADALDELNVP CHWEHECAPE DLQAQYDWLI DCRGYGAKTA 

201 WNQSPXXTST LRGIRGEVAR VYTPEITLNR PVRLLHPRYP LYIAPKENXV 
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251 FVIGATQIES ESQAPASVRS GLELLSALYA VHPAFGEADI LEIATGLRPT 
301 LNHHNPEIRY NRARRLIEIN GLFRHGFM IS PAVTAAAVRL AVALF DGKXA 
351 PERDEESGLA YIRRQD* 

ORF126a and ORF126-1 show 95.4% identity in 366 aa overlap: 

10 20 30 40 50 60 

orf 126a . pep MTRIAILGGGLSGRLTALQLAEQGYQIALFDKGCRRGEHAAAYVAAAMLAPAAEAVEATP 
IMIMIIM IIMH IMIMI I I 111 IIIMIMMIIIMIM III llllllll III 
or f 1 2 6- 1 MTRIAI LGGGLSGRLTALQLAEQGYQIALFDKGCRRGEHAAAYVAAAMLAPAAEAVEATP 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 126a . pep EWRLGRQXIPLWRGIRCHLKTPAMMXENGSLIVWHGQDKPLSNEFVRHLKRGGVADDXI 
Ml II lit I llllllll:!: I :|| I 1 I I I I I I I II M I I I : I I I I I I I I I I I I I I I 
or f 1 2 6 - 1 EWRLGRQS I PLWRGIRCRLNTHTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGVADDEI 

70 80 90 100 110 120 

130 140 150 160 170 180 

or f 126a . pep VRWRADD I AERE PQLGGRFS DG I YLPTEGQLDGRQI LSALADALDE LNVPCHWEHECAPE 
M llllllll llllll Mill INI IMMMIM 111 INI llll 111 II Mill 1:11 
orf 1 2 6- 1 VRWRADDIAERE PQLGGRFS DGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECVPE 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 1 26a . pep DLQAQYDWLI DCRGYGAKTAWNQS PXXTSTLRGIRGEVARVYT PE ITLNRPVRLLHPRYP 
M I I I I I 1 I I I I I I I I 1 I I I I I II I I I I M I I I I I I I I I I M I I I I I I I 1 I I I I I | I 
orf 1 26-1 GLQAQ Y DWL I DCRGYGAKTAWNQS PERT STLRGIRGEVARVYT PEI T LNRPVRLLHPRYP 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 126a. pep LYIAPKENXVFVIGATQIESESQAPASVRSGLELLSALYAVHPAFGEADILEIATGLRPT 
llllllll I I I I I I I I I I I I I I I I I I I I I I I I I I I | | I | : | | | M I I I I I I II I I I I I I 
orfl26-l LYIAPKENHVFVIGATQIESESQAPASVRSGLELLSALYAIHPAFGEADILEIATGLRPT 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 126a . pep LNHHNPEIRYNRARRLIEINGLFRHGFMISPAVTAAAVRLAVALFDGKXAPERDEESGLA 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I l-l I: I I I I I I I I I I I I I I I : f i I I f 
orf 126-1 LNHHNPEIRYNRARRLIEINGLFRHGFMISPAVTAAAARLAVALFDGKDAPERDECESGLA 

310 320 330 340 350 360 



YIRRQDX 
II lllll 
YIRRQDX 



orf 12 6a. pep 
orfl26-l 

Homology with a predicted ORF from ^gonorrhoeae 

ORF126 shows 90% identity over a 180 aa overlap with a predicted ORF (ORF126ng) from 
N. gonorrhoeae: 



50 



55 



orf 12 6. pep 
orfl26ng 
orfl26.pep 
orfl26ng 



MTRIAI LGGGLSGRLTALQLAEQGYQ I ALFDKSCRRGEHAAAYVAAAMLAPAAXTVEATP 60 

MMmiiiiiiiimmimii iim mmiMmimm mm 

MTRIAVLGGGLSGRLTALQLAEQGYQIELFDKGTRQGEHAAAYVAAAMLAPAAEAVEATP 60 

EWRLGRQSIPLWRGIRCRLNTHTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGXTDDEI 120 

iimimiiiiiiiiiiiii 1 1 1 1 1 j 1 1 1 1 1 1 1 1 1 k 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 mil 

EVIRLGRQSIPLWRGIRCRLNTLTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGVADDEI 120 



orf 1 2 6 . pep VRWRADDIAEREPQLGGRFXDGIYLPTEXQLDGRQLXSALADALDELNVPCHWEHECVPE 180 

MM 11:1 I I I I I II III I llllllll llllll: I I I 1 I t 1 1 I I i | | 1 | i | | | | z I r 
orf 12 6ng VRWRADE I AERE PQLGGRFS DG I YLPTEGQLDGRQI LS ALADALDELNV PCHWEHECAPQ 180 

60 An ORF126ng nucleotide sequence <SEQ ID 815> was predicted to encode a protein having amino 
acid sequence <SEQ ID 8 16>: 



1 MTRIAVLGGG LSGRLTALQL AEQGYQIELF DKGTRQGEHA AAYVAAAMLA 
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51 PAAEAVEATP EVIRLGRQSI PLWRGIRCRL NTLTMMQENG SLIVWHGQDK 

101 PLSSEFVRHL KRGGVADDEI VRWRADEIAE REPQLGGRFS DGIYLPTEGQ 

151 LDGRQILSAL ADALDELNVP CHWEHECAPQ DLQAQYDWVI DCRGYGAKTA 

201 WNQSPEHTST LRGIRGEVRG FTRPKSRSTA PCACCTRAIR STSPRKKTTS 

251 SSSARPKSKA KAKPPPAYVP GWNSYPRSMP STPPSAKPTS SKWRPGLRPT 

301 LNHHNPEIRY SRERRLIEIN GLFRHGF MIS PAVTAAAVRL AVALF DGKDA 

351 PERDEESGLA YIGRQD* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 817>: 

1 ATGACCCGTA TCGCCGTCCT CGGAGGCGGC CTTTCCGGAA GGCTGACCGC 

51 ATTGCAGCTT GCAGAACAAG GTTATCAGAT TGAACTTTTC GACAAGGGCA 

101 CCCGCCAAGG CGAACACGCC GCCGCCTATG TTGCCGCCGC GATGCTCGCG 

151 CCTGCGGCGG AAGCGGTCGA GGCAACGCCC GAAGTCATCA GGCTGGGCAG 

201 GCAGAGCATT CCGCTTTGGC GCGGCATCCG ATGCCGTCTG AACACGCTCA 

251 CGATGATGCA GGAAAACGGC AGCCTGATTG TGTGGCACGG GCAGGACA^G 

301 CCATTATCCA GCGAGTTCGT CCGCCATCTC AAACGCGGCG GCGTAGCGGA 

351 TGACGAAATC GTCCGTTGGC GCGCCGATGA AATCGCCGAA CGCGAACCGC 

401 AACTCGGCGG ACGTTTTTCA GACGGCATCT ACCTGCCGAC CGAAGGCCAG 

451 CTCGACGGGC GGCAAATATT GTCTGCACTT GCCGACGCTT TGGACGAACT 

501 GAACGTCCCT TGCCATTGGG AACACGAATG CGCCCCCCAA GACCTGCAAG 

551 CCCAATACGA CTGGGTAATC GACTGCCGGG GCTACGGCGC GAAAACCGCG 

601 TGGAACCAAT CCCCCGAGCA CACCAGCACC TTGCGCGGCA TACGCGGCGA 

651 AGTGGCGCGG GTTTACACGC CCGAAATCAC GCTCAACCGC CCCGTGCGCC 

701 TGCTGCACCC GCGCTATCCG CTCTACATCG CCCCGAAAGA AAACCACGTC 

751 TTCGTCATCG GCGCGACCCA AATCGAAAGC GAAAGCCAAG CCCCCGCCAG 

801 CGTACGTTCC GGGCTGGAAC TCTTATCCGC GCTCTATGCC GTCCACCCCG 

851 CCTTCGGCGA AGCCGACATC CTCGAAATCG CCGCCGGCCT GCGCCCCACG 

901 CTCAACCACC ACAACCCCGA AATCCGCTAC AGCCGCGAAC GCCGCCTCAT 

951 CGAAATCAAC GGCCTTTTCC GGCACGGCTT TATGATTTCC CCCGCCGTAA 

1001 CCGCCGCCGC CGTCAGATTG GCAGTGGCAC TGTTTGACGG AAAAGACGCG 

1051 CCCGAACGTG ATGAAGAAAG CGGTTTGGCG TATATCGGAA GACAAGATTA 

1101 A 

This corresponds to the amino acid sequence <SEQ ID 818; ORF126ng-l>: 

1 MTRIAVLGGG LSGRLTALQL AEQGYQIELF DKGTRQGEHA AAYVAAAMLA 

51 PAAEAVEATP EVIRLGRQSI PLWRGIRCRL NTLTMMQENG SLIVWHGQDK 

101 PLSSEFVRHL KRGGVADDEI VRWRADEIAE REPQLGGRFS DGIYLPTEGQ 

151 LDGRQILSAL ADALDELNVP CHWEHECAPQ DLQAQYDWVI DCRGYGAKTA 

201 WNQSPEHTST LRGIRGEVAR VYTPEITLNR PVRLLHPRYP LYIAPKENHV 

251 FVIGATQIES ESQAPASVRS GLELLSALYA VHPAFGEADI LEIAAGLRPT 

301 LNHHNPEIRY SRERRLIEIN GLFRHGF MIS PAVTAAAVRL AVALF DGKDA 

351 PERDEESGLA YIGRQD* 

ORF126ng-l and ORF126-1 show 95.1% identity in 366 aa overlap: 

10 20 30 40 50 60 

orf 12 6-1 . pep MTRIAILGGGLSGRLTALQLAEQGYQIALFDKGCRRGEHAAAYVAAAMLAPAAEAVEATP 
I 1 1 I I r I | I | I 1 1 I 1 1 I I I 1 1 1 I I 1 t t I II it MM III Milt ill lit Mill III 
orfl26nq-l MTRIAVLGGGLSGRLTALQLAEQGYQIELFDKGTRQGEHAAAYVAAAMLAPAAEAVEATP 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 12 6-1 . pep EWRLGRQS I PLWRGIRCRLNTHTMMQENGS LI VWHGQDKPL S SE FVRHLKRGGVADDE I 
||:| IN Ml III II I IN I I I I I I II I I I I I I I I I I I I I M I I 1 I I I I I I I I I I I M I 
orfl26na-l E VIRLGRQS I PLWRGIRCRLNTLTMMQENGSLI VWHGQDKPLS SE FVRHLKRGGVADDE I 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 126-1, pep VRWRADDIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECVPE 
I I I I I i : I I I M i I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I i I : I : 
orfl26ng-l VRWRADEIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECAPQ 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 126-1 . pep GLQAQYDWLI DCRG YGAKTAWNQS PEHTSTLRG IRGE VARVYT PEITLNRPVRLLH PRY P 
IIMIII: I III Mill IIMIil II II MIIIMMI II M Ml IMIIHIIII I II 
orfl26ng-l DLQAQYDWVIDCRGYGAKTAWNQSPEHTSTLRGIRGEVARVYTPEITLNRPVRLLHPRYP 

190 200 210 220 230 240 
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orf 126-1. pep 
orfl26ng-l 



orf 126-1. pep 
orfl26ng~l 



250 260 270 280 290 300 

LYIAPKENHVFVIGATQIESESQAPASVRSGLELLSALYAIHPAFGEADILEIATGLRPT 
I I I I I I I I I I I I I I I I I M I MM I I I I ! I lli II I I I I I: I I M I I I II ■ I I 1:1 I 1 I I 
LYIAPKENHVFVIGATQIESESQAPASVRSGLELLSALYAVHPAE-GEADILEIAAGLRPT 

250 260 270 280 290 300 

310 320 330 340 350 360 

LNHHN PE I R YNRARRLI E INGL FRHG FMI S PAVTAAAARLAVALFDGKDAPERDKESGLA 
IMMIlllhl M ! I MINIMI I I Mill II MM Mill Ml MM IMM MM 
LNHHN PEIRYSRERRLIEINGLFRHGFMISPAVTAAAVRLAVALFDGKDAPERDEESGLA 

310 320 330 340 350 360 
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orf 126-1. pep YIRRQDX 
I I I I I I 

orfl26ng-l YIGRQDX 

Furthermore, ORF126ng-l shows homology to a putative Rhizobium oxidase flavoprotein: 

gi 1 2627327 (AF004408) putative amino acid oxidase flavoprotein [Rhizobium etli) 
Length = 327 
Score = 169 bits (423), Expect = 3e-41 

Identities - 112/329 (34%), Positives = 163/329 (49%), Gaps = 25/329 (7%) 

Query: 3 RIAVLGGGLSGRLTALQLAEQGYQIELFDKGTRQGEHXXXXXXXXXXXXXXXXXXXXXXX 62 

RI V G G++G A QL G+++ L ++ G 
Sbjct: 2 RI LVNGAGVAGL WAWQLYRHGFRVTLAERAGTVGA-GASG FAGGMLAPWCERES AEE PV 60 

Query: 63 IRLGRQSIPLWRGIRCRLNTLTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGVADDEIVR 122 

+ LGR + W + G+L+V G+D F R G DE+ 

Sbjct: 61 LTLGRLAADWWEAA L PGHVHRRGT L WAGGRDTGE LDRFSRRT S - GWEWLDEVA- 113 

Query: 123 WRADEIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECAPQDL 182 

IA EP L GRF ++ E LD RQ L+ALA L++ + + 
Sbjct: 114 IAALEPDLAGRFRRALFFRQEAHLDPRQAIAALAAGLEDARMRLTLG WGES 165 

Query: 183 QAQYDWVIDCRGYGAKTAWNQSPEHTSTLRGIRGEVARVYTPEITLNRPVRLLHPRYPLY 242 

+D V+DC G LRG+RGE+ V T E++L+RPVRLLHPR+P+Y 

Sbjct: 166 DVDHDRWDCTGAA QIGRLPGLRGVRGEMLCVETTEVSLSRPVRLLHPRHPIY 218 

Query: 243 IAPKENHVFVIGATQIESESQAPASVRSGLELLSALYAVHPAFGEADILEIAAGLRPTLN 302 

I P++ + F++GAT IES+ P + RS +ELL+A YA+HPAFGEA + E AG+RP 
Sbjct: 219 IVPRDKNRFMVGATMIESDDGGPITARSLMELLNAAYAMHPAFGEARVTETGAGVRPAYP 278 

Query: 303 HHNPEIRYSRERRLIEINGLFRHGFMISP 331 

+ P R ++E R + +NGL+RHGF+++P 
Sbjct: 279 DNLP — RVTQEGRTLHVNGLYRHGFLLAP 305 

This analysis suggests that the proteins from N.meningitidis and N. gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 97 

The following DNA sequence, believed to be complete, was identified in N.meningitidis <SEQ ID 
50 819>: 



55 



60 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 



ATGACTGATA 
GATATTGTCT 
TTGAGAAAGC 
CATTTTATGG 
TACCAAGTGG 
GTTTGAATGG 
AAGGCGGTAG 
TGAAAATCTA 
GACGGGCTGG 
GTAG 



ATCGGGGGTT 
GTACTTGCTT 
AAAGATAAAT 
AAAAGTTTTA 
CCAAGTTTGC 
AATCGtCGCG 
CCATAGATAA 
GTAACCTTTA 
ATTATTTTAA 



TACGCTGGTT 
TAATTGTTTA 
GCAGTGCGGG 
TCTGCAGAAT 
CGATTAAAGA 
CGGG. .GCTT 
AGATAAAAAT 
aTTTGCAAGA 
AGGAAATGAT 



GAATTAATAT 
TCCGAGCTAT 
CAGCCTTGTT 
GGGAGGTTTA 
GGCAGAAGGC 
TAGACAGTAA 
CCTTTTATTA 
AGTCCGCCAG 
AAGGACTGCA 



CAGTGGTCTT 
CGCAATTATG 
AGAAAATGCA 
AACAAACATC 
TTTTGTATCC 
ATTCATGTTG 
TTAAGATGAA 
TTCGTGTAGT 
AGTTACTTAA 
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This corresponds to the amino acid sequence <SEQ ID 820; ORF127>: 

1 MTDNRGFTLV ELISWLILS VLALIVYPSY RNYVEKAKIN AVRAALLENA 
51 HFMEKFYLQN GRFKQTSTKW PSLPIKEAEG FCIRLNGIVA RXALDSKFML 
101 KAVAIDKDKN PFIIKMNENL VTFICKKSAS SCSDGLDYFK GNDKDCKLLK 
151 * 

Further work revealed the following DNA sequence <SEQ ID 821>: 

1 ATGACTGATA ATCGGGGGTT TACGCTGGTT GAATTAATAT CAGTGGTCTT 

51 GATATTGTCT GTACTTGCTT TAATTGTTTA TCCGAGCTAT CGCAATTATG 

101 TTGAGAAAGC AAAGATAAAT GCAGTGCGGG CAGCCTTGTT AGAAAATGCA 

151 CATTTTATGG AAAAGTTTTA TCTGCAGAAT GGGAGGTTTA AACAAACATC 

201 TACCAAGTGG CCAAGTTTGC CGATTAAAGA GGCAGAAGGC TTTTGTATCC 

251 GTTTGAATGG AATCGCGCGC GGGGCTTTAG ACAGTAAATT CATGTTGAAG 

301 GCGGTAGCCA TAGATAAAGA TAAAAATCCT TTTATTATTA AGATGAATGA 

351 AAATCTAGTA ACCTTTATTT GCAAGAAGTC CGCCAGTTCG TGTAGTGACG 

401 GGCTGGATTA TTTTAAAGGA AATGATAAGG ACTGCAAGTT ACTTAAGTAG 

This corresponds to the amino acid sequence <SEQ ID 822; ORF127-l>: 

1 MTDNRGFTL V ELISWLILS VLALIVY PSY RNYVEKAKIN AVRAALLENA 
51 HFMEKFYLQN GRFKQTSTKW PSLPIKEAEG FCIRLNGIAR GALDSKFMLK 
101 AVAIDKDKNP FIIKMNENLV TFICKKSASS CSDGLDYFKG NDKDCKLLK* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF fro m N. meningitidis (strain A) 

ORF127 shows 98.0% identity over a 150aa overlap with an ORF (ORF127a) from strain A of If. 
meningitidis: 



10 20 30 40 50 60 

MTDNRGFTLVELISVVLILSVLALIVYPSYRNYVEKAKINAVRAALLENAHFMEKFYLQN 

I I 1 I I M » I M I I I I I I I M 1 1 I I I I I I I I 1 I I M I I N I : M I M I I I M m I 1! I ! I 

orfl27a MTDNRGFTLVELISWLILSVIALIVYPSYRNYVE 

°" io 20 30 40 50 60 

70 80 90 100 110 120 

GRFKQTSTKWPSLPIKEAEGFCIRLNGIVARXALDSKFMLKAVAIDK^ 
I | I | I I I I I | I I 1 | | | | I I I I I I I II II II I 1 I I I I I I I I I M I I I I I I 1 I I I I I I I I 
orf!27a GRFKQTSTKWPSLPIKEAEGFCIRLNGI-ARGALDSKFMLKAVAIDKDKNPFIIKMNENL 
° r 70 80 90 100 110 



orfl27.pep 



orfl27.pep 



130 140 150 

orfl27 pep VTFICKKSASSCSDGLDYFKGNDKDCKLLKX 
IIMimillllll I I M I I I I Ml IN M 
or f 127 a VTFICKKSAS SCSDGLDYFKGNDKDCKLLKX 

120 130 140 150 

The complete length ORF127a nucleotide sequence <SEQ ID 823> is: 

1 ATGACTGATA ATCGGGGGTT TACGCTGGTT GAATTAATAT CAGTGGTCTT 

51 GATATTGTCT GTACTTGCTT TAATTGTTTA TCCGAGCTAT CGCAATTATG 

101 TTGAGAAAGC AAAGATAAAT ACAGTGCGGG CAGCCTTGTT AGAAAATGCA 

151 CATTTTATGG AAAAGTTTTA TCTGCAGAAT GGGAGATTTA AACAAACATC 

201 TACCAAATGG CCAAGTTTGC CGATTAAAGA GGCAGAAGGC TTTTGTATCC 

251 GTTTGAATGG AATCGCGCGC GGGGCCTTAG ACAGTAAATT CATGTTGAAG 

301 GCGGTAGCCA TAGATAAAGA TAAAAATCCT TTTATTATTA AGATGAATGA 

351 AAATCTAGTA ACCTTTATTT GCAAGAAGTC CGCCAGTTCG TGTAGTGACG 

401 GGCTGGATTA TTTTAAAGGA AATGATAAGG ACTGCAAGTT ACTTAAGTAG 

This encodes a protein having amino acid sequence <SEQ ID 824>: 

1 MTDNRGFTL V ELISWLILS VLALIVY PSY RNYVEKAKIN TVRAALLENA 
51 HFMEKFYLQN GRFKQTSTKW PSLPIKEAEG FCIRLNGIAR GALDSKFMLK 
101 AVAIDKDKNP FIIKMNENLV TFICKKSASS CSDGLDYFKG NDKDCKLLK* 



WO 99/24578 



-452- 



PCTAB98/01665 



ORF127a and ORF127-1 show 99.3% identity in 149 aa overlap: 

10 20 30 .40 50 60 

orf 127a . pep OTDNRGFTLVELISVVLILSVLALIVYPSYR^ 

1 1 1 1 1 1 1 ! 1 1 I 1 1 I II I 1 1 1 1 1 1 1 1 1 1 II I I 1 1 I 1 1 1 1 1 I.: I I 1 1 I 1 1 1 1 1 1 1 1 II I 1 1 I 
orfl27-l KTDNRGFTLVELISVVLILSVLALIVYPSYRNYVEKAKINAVRAALLENAHFMEKFYLQN 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 127a . pep GRFKQTSTKWPSLPIKEAEGFCIRLNGIARGALDSKFMLKAVAIDKDKNPFIIKMNENLV 

III II II III I III II I II III lit II I II I II I III II III I II I II III Ml II I II I 
orf 127-1 GRE^QTSTKWPSLPIKElAEGrciRLNGIARGALDSKFMLKAVAIDKDKNPFIIKMNENLV 

70 80 90 100 110 120 

130 140 150 

or f 127 a . pep TFICKKSASSCSDGLDYFKGNDKDCKLLKX 
II I I I I I II I I I I I II I I I 1 I I I I I I I I I I 
orfl27-l TFICKKSASSCSDGLDYFKGNDKDCKLLKX 

130 140 150 



Homology with a predicted ORF from N. gonorrhoeae 

ORF127 shows 97.3% identity over a 150 aa overlap with a predicted ORF (ORF127ng) froi 
N. gonorrhoeae: 



orf 127 .pep 
orfl27ng 
orf 127. pep 
orfl27ng 
orf 127. pep 
orfl27ng 



MT DNRG FT L VE L I SWL I LS VLAL I VY P S YRN YVEKAKIN AVRAALLENAH FMEK FYLQN 

I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I t I I : I I I I I I I II I I I I I 
MTDNRGFTLVELISWLILSVLALIVYPSYRNYVEKAKINAVRAAFLENAHFMEKFYLQN 



60 
60 



120 



GRFKQTSTKWPSLPIKEAEGFCIRLNGIVARXALDSKFMLKAVAIDKDKNPFIIKMNENL 
II I I I I I I I I I I I I I I I I II I I I I I I I I II I t I i I I I I 1 1 I t 1 I I I 1 f J I i t I t I I I I 
GRFKQTSTKWPSLPIKEAEGFCIRLNGI-ARGALDSKFMLKAVAIDKDKNPFIIKMNENL 119 

VTFICKKSASSCSDGLDYFKGNDKDCKLLK 150 

I I I I I I I I I II I I I I I I I I I I II I I I I I 1 

VT FI CKKSAS S CS DRLD Y FKGN DKDCKLLK 149 



The complete length ORF 127ng nucleotide sequence <SEQ ID 825> is: 



l 

51 
101 
151 
201 
251 
301 
351 
401 



ATGACTGATA 
GATATTGTCT 
TTGAGAAAGC 
CATTTTATGG 
TACCAAATGG 
GTTTGAATGG 
GCGGTAGCCA 
AAATCTAGTA 
GGCTGGATTA 



ATCGGGGGTT 
GTACTTGCTT 
AAAGATAAAT 
AAAAGTTTTA 
CCAAGTTTGC 
AATCGCGCGC 
TAGATAAAGA 
ACCTTTATTT 
TTTTAAAGGA 



TACACTGGTT 
TAATTGTTTA 
GCAGTGCGGG 
TCTGCAGAAT 
CGATTAAAGA 
GGGGCTTTAG 
TAAAAATCCT 
GCAAGAAGTC 
AATGATAAGG 



GAATTAATAT 
TCCGAGCTAT 
CAGCCTTGTT 
GGGAGATTTA 
GGCAGAAGGC 
ACAGTAAATT 
TTTATTATTA 
CGCCAGTTCG 
ACTGCAAGTT 



CAGTGGTCTT 
CGCAATTATG 
AGAAAATGCA 
AACAAACATC 
TTTTGTATCC 
CATGTTGAAG 
AGATGAATGA 
TGTAGTGACG 
ACTTAAGTAG 



This encodes a protein having amino acid sequence <SEQ ID 826>: 

1 MTDNRGFTL V ELISWLILS VLALIV YPSY RNYVEKAKIN AVRAAFLENA 
51 HFMEKFYLQN GRFKQTSTKW PSLPIKEAEG FCIRLNGIAR GALDSKFMLK 
101 AVAIDKDKNP FIIKMNENLV TFICKKSASS CSDRLDYFKG N DKDCKLLK* 

ORF127ng and ORF127-1 show 100.0% identity in 149 aa overlap: 

10 20 30 40 50 60 

orf 127-1 . pep OTDNRGFTLVELISVVLILSVIALIVYPSYRNYVEKAKINAVRAALLENAHFMEKFYLQN 
I II I HIM II I III III II llllllllt MINIMI llllll II II Mil III I II 1 1 
orfl27ng-l MTDNRGFTLVELISWLILSVLALIVYPSYRNYVEKAKINAVRAALLENAHFMEKFYLQN 

10 20 30 40 50. 60 



70 80 90 100 110 120 

orf 127-1 . pep GRFKQTSTKWPSLPIKEAEGFCIRLNGIARGALDSKFMLKAVAIDKDKNPFIIKMNENLV 
II II I I I I I I I II I I II I I II I I || | I | | | I I II I I I I I I I I I I I I I II M I I I II I I I I 
orfl27ng-l GRFKQTSTKWPSLPIKEAEGFCIRLNGIARGALDSKFMLKAVAIDKDKNPFIIKMNENLV 

70 80 90 100 110 120 
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130 140 150 

orf 127-1 .Pep T FI CKKS AS S CS DGLD YFKGNDKDCKLLKX 
° P Mill! till III II II till III I Hill 

orfl27na-l T FICKKS AS SCS DGLDYFKGNDKDCKLLKX 

° rt1 ^ 9 130 140 150 

This analysis, including the fact that the predicted transmembrane domain is shared by the 
meningococcal and gonococcal proteins, suggests that the proteins from N.meningitidis and 
N.gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

Example 98 

The following partial DNA sequence was identified in ^meningitidis <SEQ ID 827> 

1 GTGTCGCTGG CTTCGGTGAT TGCCTCTCAA ATCTTCCTTT ACGAAGATTT 

51 CAACCAAATG CGGAAAACCC GTGGAGCTAT CTGCGGTTTT CTTGTCCAAT 

101 ATTTATCTGG GGTTTCAGCA GGGGTATTTC GATTTGAGTG CCGACGAGAA 

151 CCCCGTACTG CATATCTGGT CTTTGGCAGT AGAGGAACAG TATTACCTCC 

201 TGTATCCCCT TTTGCTGATA TTTTGCTGCA AAAAAACCAA ATCGCTACGG ( 

251 GTGCTGCGTA ACATCAGCAT CATCCTGTTT TTGATTTTGA CTGCCTCATC 

301 GTTTTTGCCA AGCGGGTTTT ATACCGACAT CCTCAACCAA CCCAATACTT 

351 ATTACCTTTC GACACTGAGG TTTCCCGAGC TGTTGGCAGG TTCGCTGCTG 

401 GCGGTTTACG GGCAAACGCA AAACGGCAGA CGGCAAACAG CAAATGGAAA 

451 ACGGCAGTTG CTTTCATCAC TCTGCTTCGG CGCATTGCTT GCCTGCCTGT j 

501 TCGTGATTGA CAAACACAAT CCGTTTATCC CGGGAATGAC CCTGCTCCTT 

551 CCCTGCCTGC TGACGGCACT GCTTATCCGG AGTATGCAAT ACGGGACACT j 

601 TCCGACCCGC ATCCTGTCGG CAAGCCCCAT CGTATTTGTC GGCAAAATCT 

651 CTTATTCCCT ATACCTGTAC CATTGGATTT TTATTGCTTT CGCTCCGCTC 

701 ATTAGAGGCG GGAAACAGCT CGGACTGCCT GCCG. . 

This corresponds to the amino acid sequence <SEQ ID 828; ORF128>: 

1 VSLASVIASQ IFLYEDFNQM RKTVELSAVF LSNIYLGFQQ GYFDLSADEN 

51 ' ' PVLHIWSLAV EEQYYLLYPL LLIFCCKKTK SLRVLRNISI ILFLILTASS 

101 FLPSGFYTDI LNQPNTYYLS TLRFPELLAG SLLAVYGQTQ NGRRQTANGK 

151 RQLLSSLCFG ALLACLFVID KHNPFIPGMT LLLPCLLTAL LIRSMQYGTL 

201 PTRILSASPI VFVGKISYSL YLYHWIFIAF APLIRGGKQL GLPA. . 

Further work revealed the complete nucleotide sequence <SEQ ID 829>: 

1 ATGCAAGCTG TCCGATACAG ACCGGAAATT GACGGATTGC GGGCCGTCGC 

51 CGTGCTATCC GTCATGATTT TCCACCTGAA TAACCGCTGG CTGCCCGGAG 

101 GATTCCTGGG GGTGGACATT TTCTTTGTCA TCTCAGGATT CCTCATTACC 

151 GGCATCATTC TTTCTGAAAT ACAGAACGGT TCTTTTTCTT TCCGGGATTT 

201 TTATACCCGC AGGATTAAGC GGATTTATCC TGCCTTTATT GCGGCCGTGT 

251 CGCTGGCTTC GGTGATTGCC TCTCAAATCT TCCTTTACGA AGATTTCAAC 

301 CAAATGCGGA AAACCGTGGA GCTTTCTGCG GTTTTCTTGT CCAATATTTA 

351 TCTGGGGTTT CAGCAGGGGT ATTTCGATTT GAGTGCCGAC GAGAACCCCG 

401 TACTGCATAT CTGGTCTTTG GCAGTAGAGG AACAGTATTA CCTCCTGTAT 

451 CCCCTTTTGC TGATATTTTG CTGCAAAAAA ACCAAATCGC TACGGGTGCT 

501 GCGTAACATC AGCATCATCC TGTTTTTGAT TTTGACTGCC TCATCGTTTT 

551 TGCCAAGCGG GTTTTATACC GACATCCTCA ACCAACCCAA TACTTATTAC 

601 CTTTCGACAC TGAGGTTTCC CGAGCTGTTG GCAGGTTCGC TGCTGGCGGT 

651 TTACGGGCAA ACGCAAAACG GCAGACGGCA AACAGCAAAT GGAAAACGGC 

701 AGTTGCTTTC ATCACTCTGC TTCGGCGCAT TGCTTGCCTG CCTGTTCGTG 

751 ATTGACAAAC ACAATCCGTT TATCCCGGGA ATGACCCTGC TCCTTCCCTG 

801 CCTGCTGACG GCACTGCTTA TCCGGAGTAT GCAATACGGG ACACTTCCGA 

851 CCCGCATCCT GTCGGCAAGC CCCATCGTAT TTGTCGGCAA AATCTCTTAT 

901 TCCCTATACC TGTACCATTG GATTTTTATT GCTTTCGCCC ATTACATTAC 

951 AGGCGACAAA CAGCTCGGAC TGCCTGCCGT ATCGGCGGTT GCCGCGTTGA 

1001 CGGCCGGATT TTCCCTGTTG AGTTATTATT TGATTGAACA GCCGCTTAGA 

1051 AAACGGAAGA TGACCTTCAA AAAGGCATTT TTCTGCCTCT ATCTCGCCCC 

1101 GTCCCTGATA CTTGTCGGTT ACAACCTGTA CGCAAGGGGG ATATTGAAAC 

1151 AGGAACACCT CCGCCCGTTG CCCGGCGCGC CCCTTGCTGC GGAAAATCAT 
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1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 



TTTCCGGAAA 
GGGGTTTCTG 
TGTCCCTCGA 
AACCCGTTAT 
TTTCATTGCC 
GATTTGAAGC 
GAAACCGTCA 
CAACACATCA 
TTGCCGCAAA 
AAGAGCAATC 
TTGGGTGGAC 
GCCGCTATCT 
TATATGGGGC 
CGGCGGCGCA 



CCGTCCTGAC 
GATTATGTCG 
TTCGGAGTGT 
GTCGAAAATA 
CAATTCTATG 
GCAATCCTTC 
AAAGGATAGC 
ATCAGCCGTT 
CCAATATCTC 
AGGCGGTCTT 
GCACAAAAAT 
TTACGGCGAC 
GGGAATTCCA 
TTGCAGTAG 



CCTCGGCGAC 
GCAGCCGGGA 
TTGGTTTGGG 
CCGGGATGAA 
ATTTGAGGAT 
CTAATACCCG 
CGCCGTCAAA 
CGCCCCTGAG 
CGCCCCATTC 
TGATTTGATT 
ACCTGCCCAA 
CAAGACCACC 
CAAACACGAA 



TCGCACGCCG 
AGGGTGGAAA 
TAGATGAGAA 
GTTGAAAAAG 
GGGCGGCCAG 
GGTTCCCAGC 
CCCGTCTATG 
GGAGGAAAAA 
AGGCTATGGG 
AAAGATATTC 
AAACACGGTC 
TGACCTATTT 
CGCCTGCTTA 



GACACCTGAG 
GCCAAAATCC 
GCTGGCAGAC 
CCGAAGCCGT 
CCTGTGCCGA 
CCGATTCAGG 
TTTTTGCAAA 
TTGAAAAGAT 
CGACATCGGC 
CCAATGTGCA 
GAAATATACG 
CGGTTCTTAT 
AATCTTCCCA 



This corresponds to the amino acid sequence <SEQ ID 830; ORF128-l>: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 



MQAVRYRPEI_ 

GIIL SEIQNG 

QMRKTVELSA 

PLLLIFCCKK 

LSTLRFPELL 

IDKHNP FIPG 

SLYLYHWIFI 

KRKMTFKKAF 

FPETVLTLGD 

NPLCRKYRDE 

ETVKRIAAVK 

KSNQAVFDLI 

YMGREFHKHE 



DGLRAVAVLS VMIFHLN NRW LPGGFL GVDI FFVISGFLIT 

AAV5LASVIA SQIFLYEDFN 



SFSFRDFYTR 
VFLSNIYLGF 
TKSLRVLRNI_ 
AGSLLAVYGQ 
MTLLLPCLLT 



RIKRIYPAFI 
QQGYFDLSAD 
SIILFLILTA 



ENPVLHIWSL 
SSFLPSGFYT 



TQNGRRQTAN 
ALLI RSMQYG 



GKRQLLSSLC 



AVEEQYYLLY 
DILNQPNTYY 
FGALLACLFV 



AFAHYITGDK 
FCLYLAPSLI 
SHAGHLRGFL 
VEKAEAVFIA 
PVYVFANNTS 
KDIPNVHWVD 
RLLKSSHGGA 



QLGLPAVSAV 



TLPTRILSAS 
AALTAGFSLL 



LVGYNLYARG 
DYVGSREGWK 
QFYDLRMGGQ 
ISRSPLREEK 
AQKYLPKNTV 
LQ* 



ILKQEHLRPL 
AKILSLDSEC 
PVPRFEAQSF 
LKRFAANQYL 
EIYGRYLYGD 



PIVFVGKISY 
SYYLIEQPLR 
PGAPLAAENH 
LVWVDEKLAD 
LIPGFPARFR 
RPIQAMGDIG 
QDHLTYFGSY 



Computer analysis of this amino acid sequence gave the following results: 

Homology with hypothetical integral membrane protein HI0392 of HAnfluemae (accession number 1132723^ 
ORF128 and HI0392 show 52% aa identity in 180aa overlap: 

0rfl28: 1 
HI0392; 



VSLASVIASQI FLYEDFNQMRKTVELSAVFLSNI YLGFQQGYFDLSADENPVLHIWSLAV 60 
++L S IAS IF+Y DFN++RKT+EL+ FLSN YLG QGYFDLSA+ENPVLHIWSLAV 
4 6 MALVSFIASAIFIYNDFNKLRKTIELAIAFLSNFYLGLTQGYFDLSANENPVLHIWSLAV 105 



0rfl28: 61 EEQXXXXXXXXXIFCCKKTKSLRVLRNISIILFLILTASSFLPSGFYTDILNQPNTYYLS 120 

E Q I KK + ++VL I++ILF IL A+SF+ + FY ++L+QPN YYLS 

HI0392: 106 EGQYYLI FPLI L I LAYKKFRE VKVLFI ITL I LFFI LLAT S FVSAN F YKEVLHQPN I YYLS 165 

0rfl28: 121 TLRFPELLAGSLLAVYGQTQNGRRQTANGKRQLLSSLCFGALLACLFVIDKHNPFIPGMT 180 

LRFPELL GSLLA+Y N + Q + +L+ L L +CLF+++ + FIPG+T 

HI0392: 166 NLRFPELLVGSLLAIYHNLSN-KVQLSKQVNNILAILSTLLLFSCLFLMNNNIAFIPGIT 224 

Homology with a predicted ORF from N.meninzitidis (strain A) 

ORF128 shows 98.0% identity over a 244aa overlap with an ORF (ORF128a) from strain A of AT. 
meningitidis: 

10 20 30 

orfl28.pep VSLASVIASQI FLYEDFNQMRKTVELSAVF 

I I I I I I I I M I I I I I ! I I I I I I I I I I M It 
orfl28a ILSEIQNGSFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTVELSAVF 
60 70 80 90 100 110 

40 50 60 70 80 90 

orfl28.pep LSNIYLGFQQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISI 
IMMMIIIIiMI (I llllllllllll I II Ml II II I I II IN tl I II I II I Mill 
orfl28a LSNIYLGFQQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISI 
120 130 140 150 160 170 

100 110 120 130 140 150 

. orf 128 .pep ILFLILTASSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGK 
M I I II I hi III I Ml I M MM I MINI I II I II I I I II I I II I I I I I III I III II 
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10 



15 



orfl28a 

orfl28.pep 
orfl28a 

orfl28.pep 
orfl28a 

orfl28a 



ILFLI LTAT SFLPSGFYTDILNQPNT YYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGK 

200 210 220 



180 



190 



190 



200 



210 



RQLLSSLCFGAL1ACLFVIDKHNPFI 

. t I I M | I I t I I 1 I I I i I ! i I i I I 1 I 1 I I I I I I I I I I I I I I 1 I I M I t I I I I I I In 

260 270 280 290 



240 



250 



220 230 240 

VFVGKI S YSLYLYHWI FI AFAPLIRGGKQLGL PA 

VEVGKISYSLYLYIIWIE^ 

320 330 340 350 



300 



310 



KMT FKKAFFCLYLAP S LI LVGYNLYARG I LKQEHLRPL PGAPLAAENHFPETVLT kGDSH 

380 390 400 41U 



360 



370 



The complete length ORF128a nucleotide sequence <SEQ ID 831> is: 



20 



25 



30 



35 



40 



45 



50 



55 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 



ATGCAAGCTG 
CGTGCTATCC 
GATTCCTGGG 
GGCATCATTC 
TTATACCCGC 
CGCTGGCTTC 
CAAATGCGGA 
TCTGGGGTTT 
TACTGCATAT 
CCTCTTTTGC 
GCGTAACATC 
TGCCAAGCGG 
CTTTCGACAC 
TTACGGGCAA 
AGTTGCTTTC 
ATTGACAAAC 
CCTGCTGACG 
CCCGCATCCT 
TCCCTATACC 
AGGCGACAAA 
CGGCCGGATT 
AAACGGAAGA 
GTCCCTGATA 
AGGAACACCT 
TTTCCGGAAA 
GGGGTTTCTG 
TGTCCCTCGA 
AACCCGTTAT 
TTTCATTGCC 
GATTTGAAGC 
GAAACCGTCA 
CAACACATCA 
TTGCCGCAAA 
AAGAGCAATC 
TTGGGTGGAC 
GCCGCTATCT 
TATATGGGGC 
CGACGGCGCA 



TCCGATACAG 
GTCATGATTT 
GGTGGACATT 
TTTCTGAAAT 
AGGATTAAGC 
GGTGATTGCC 
AAACCGTGGA 
CAGCAGGGGT 
CTGGTCTTTG 
TGATATTTTG 
AGCATCATCC 
GTTTTATACC 
TGAGGTTTCC 
ACGCAAAACG 
ATCACTCTGC 
ACAATCCGTT 
GCACTGCTTA 
GTCGGCAAGC 
TGTACCATTG 
CAGCTCGGAC 
TTCCCTGTTG 
TGACCTTCAA 
CTTGTCGGTT 
CCGCCCGTTG 
CCGTCCTGAC 
GATTATGTCG 
TTCGGAGTGT 
GTCGAAAATA 
CAATTCTATG 
GCAATCCTTC 
AAAGGATAGC 
ATCAGCCGTT 
CCAATATCTC 
AGGCGGTCTT 
GCACAAAAAT 
TTACGGCGAC 
GGGAATTTCA 
TTGCAGTAG 



ACCGGAAATT 
TCCACCTGAA 
TTCTTTGTCA 
ACAGAACGGT 
GGATTTATCC 
TCTCAAATCT 
GCTTTCTGCG 
ATTTCGATTT 
GCAGTAGAGG 
CTGCAAAAAA 
TATTTCTGAT 
GATATTCTCA 
CGAGCTGTTG 
GCAGACGGCA 
TTCGGCGCAT 
TATCCCGGGA 
TCCGGAGTAT 
CCCATCGTAT 
GATTTTTATT 
TGCCTGCCGT 
AGTTATTATT 
AAAGGCATTT 
ACAACCTGTA 
CCCGGCGCGC 
CCTCGGCGAC 
GCAGCCGGGA 
TTGGTTTGGG 
CCGGGATGAA 
ATTTGAGGAT 
CTAATACCCG 
CGCCGTCAAA 
CGCCCCTGAG 
CGCCCCATTC 
TGATTTGATT 
ACCTGCCCAA 
CAAGACCACC 
CAAACACGAA 



GACGGATTGC 
TAACCGCTGG 
TCTCAGGATT 
TCTTTTTCTT 
TGCTTTTATT 
TCCTTTACGA 
GTTTTCTTGT 
GAGTGCCGAC 
AACAGTATTA 
ACAAAATCGC 
TTTGACTGCC 
ACCAACCCAA 
GCAGGTTCGC 
AACAGCAAAT 
TGCTTGCCTG 
ATGACCCTGC 
GCAATACGGG 
TTGTCGGCAA 
GCTTTCGCCC 
ATCGGCGGTT 
TGATTGAACA 
TTCTGCCTCT 
CGCAAGGGGG 
CCCTTGCTGC 
TCGCACGCCG 
AGGGTGGAAA 
TAGATGAGAA 
GTTGAAAAAG 
GGGCGGCCAG 
GGTTCCCAGC 
CCCGTCTATG 
GGAGGAAAAA 
AGGCTATGGG 
AAAGATATTC 
AAACACGGTC 
TGACCTATTT 
CGCCTGCTTA 



GGGCCGTCGC 
CTGCCCGGAG 
CCTCATTACC 
TCCGGGATTT 
GCGGCCGTGT 
AGATTTCAAC 
CCAATATTTA 
GAGAACCCCG 
CCTCCTGTAT 
TACGGGTGCT 
ACATCGTTTT 
TACTTATTAC 
TGCTGGCGGT 
GGAAAACGGC 
CCTGTTCGTG 
TCCTTCCCTG 
ACACTTCCGA 
AATCTCTTAT 
ATTACATTAC 
GCCGCGTTGA 
GCCGCTTAGA 
ATCTCGCCCC 
ATATTGAAAC 
GGAAAATCAT 
GACACCTGCG 
GCCAAAATCC 
GCTGGCAGAC 
CCGAAGCCGT 
CCCGTGCCGA 
CCGATTCAGG 
TTTTTGCAAA 
TTGAAAAGAT 
CGACATCGGC 
CCAATGTGCA 
GAAATATACG 
CGGTTCTTAT 
AATCTTCTCG 



This encodes a protein having amino acid sequence <SEQ ID 832>: 



60 



65 



1 MQAVRYRPEI_ 

51 GIILSEIQNG 

101 QMRKTVELSA 

151 PLLLIFCCKK 

201 LSTLRFPELL 

251 IDBCHNPFIPjG 

301 SLYLYHWIFI 

351 KRKMTFKKAF 

401 FPETVLTLGD 

451 NPLCRKYRDE 



DGLR AVAVLS VMIFHLN NRW T.PCfiFLGVDI FFVISGFLIT 
qfqypnwra PTgPTYPRFT AAVSLASVIA SQIFLYEDFN 
VFLSNIYLGF QQGYFDLSAD ENPVLHIWSL AVEEQYYLLY 
TTf gT.PVT.RN T SIILFLILTA TSFLPS GFYT DILNQPNTYY 
AGSLLAVYGQ TQNGRRQTAN (SCROLLS SLC FGALLACLFV 
MTLLLPCLLT ALLIR SMQYG TLPTRILSAS PIVFVGKISY 
nFAHYITGDK OLG LPAVSAV AALTAGFSLL SYYLIEQPLR 
FCLYLAPSLI LVGYNLYARG I LKQEHLRPL PGAPLAAENH 
SHAGHLRGFL DYVGSREGWK AKILSLDSEC LVWVDEKLAD 
VEKAEAVFIA QFYDLRMGGQ PVPRFEAQSF LIPGFPARFR 
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501 ETVKRIAAVK PVYVFANNTS ISRSPLREEK LKRFAANQYL RPIQAMGDIG 
551 KSNQAVFDLI KDIPNVHWVD AQKYLPKNTV EIYGRYLYGD QDHLTYFGSY 
601 YMGREFHKHE RLLKSSRDGA LQ* 

ORF128a and ORF128-1 show 99.5% identity in 622 aa overlap: 

orf 128a . pep MQAVRYRPEIDGLRAVAVLSVMIFHLNNRWLPGGFLGVDIFFVISGFLITGIILSEIQNG 
I Ml II I I M Mill ! II I II I t I I Mi Nil M I ! II ! II I II I II I I Ml I MM I I t 
orf 128-1 MQAVRYRPEI DGLRAVAVLSVMI FHLNNRWLPGGFLGVDI FFVI SGFLITGI ILSEI QNG 

orf 128a. pep SFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTVELSAVFLSNIYLGF 
M M | | M M M I | | I | I | | I I | j I 11 I I | I | I | | I | | I | i M M | I I | | I | M I I I I M 
orf 128-1 SFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTVELSAVFLSNIYLGF 

orf 128a . pep QQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISIILFLILTA 
I M I I II 1 1 I M I M M M M M M M I M 1 1 I I II II I I I HI M M M M M M I II I 
orf 128-1 QQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISIILFLILTA 

orf 128a . pep TSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGKRQLLSSLC 
M I II M I M II M II 1 1 II II II 1 1 M II M M i M M II II II 1 1 I II M 1 1 II M II 
orf 128-1 SSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGKRQLLSSLC 

orf 128a . pep FGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPIVFVGKISY 
I 1 1 H ) I 1 1 i 1 I I f ) M II ! 1 1 1 I 1 1 1 I 1 I I f I I I I M I I I 1 1 1 1 1 I I II I I I 1 1 1 1 t I I 
orf 128-1 PGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPIVFVGKISY 

orf 128a . pep SLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKRKMTFKKAF 
M I II I I I I I I I I I I I I I I I M I I || M I I I I I I I I I I I I I I I I I I ! I I I I II I I I I I M 
orfl28-l S LYLYHWI FI AFAHYITGDKQLGLPAVS AVAALTAGFSLLS YYLI EQPLRKRKMT FKKAF 

orf 128a . pep FCLYLAPSLILVGYNLYARGILKQEHLRPLPGAPLAAENHFPETVLTLGDSHAGHLRGFL 
M I II M M I I Mill I I M II I M I I I II II I M III I I I MM M I I III M I I M I I 
orf 128-1 FCLYLAPSLILVGYNLYARGILKQEHLRPLPGAPLAAENHFPETVLTLGDSHAGHLRGFL 

orf 128 a . pep DYVGSREGWKAKILSLDSECLVWVDEKLADNPLCRKYRDEVEKAEAVFIAQFYDLRMGGQ 
I M II M I M I M M I M M II I II I I II I II I I I I I II II II II II I I I II II I II I M 
orf 128-1 DYVGSREGWKAKILSLDSECLVWVDEKLADNPLCRKYRDEVEKAEAVFIAQFYDLRMGGQ 

orf 128a . pep PVPRFEAQS FL I PG FPARFRET VKRI AAVKPV YVFANNTS I SRSPLREEKLKRFAANQYL 

M IMMM II! Mill MINI II HIM II ill II II II III MINIMI I I 

orf 128-1 PVPRFEAQSFLIPGFPARFRETVKRIAAVKPVYVFANNTSISRSPLREEKLKRFAANQYL 

orf 128a . pep RPIQAMGDIGKSNQAVFDLIKDIPNVHWVDAQKYLPKNTVEIYGRYLYGDQDHLTYFGSY 
I I I I I II M I I I I II I I I I I I I I I I I I I I I I I I I I II I I I I I I | I I I I I I | N I I I I I | I 
orf 128-1 RPIQAMGDIGKSNQAVFDLIKDIPNVHWVDAQKYLPKNTVEIYGRYLYGDQDHLTYFGSY 

orf 128a . pep YMGRE FHKHERLLKS SRDGALQX 
M II I Ml I INN II: Mill 
orfl28-l YMGRE FHKHERLLKS SHGGALQX 

Homology with a predicted ORF from ^gonorrhoeae 

ORF128 shows 93.4% identity over 244 aa overlap with a predicted ORF (ORF128ng) from N. 
gonorrhoeae: 

orf 128. pep VSLASVIASQI FLYEDFNQMRKTVELSAVF 30 

I I I I I I I I I I I I ! II I I I I I I II: M I: I I 
orfl28ng I LSE IQNGS FS FRDFYTRRIKRI YPAFI AAVS LAS VI ASQI FLYE DFNQMRKT IELST VF 112 

orf 128 .pep LSNIYLGFQQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISI 90 

IMINII: I M I I I I I I I I I I I I I II I I I I M I I I II II M I I I I I II I Ml I I I M 
orfl28ng LSNIYLGFRLGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCYKKTKSLRVLRNISI 172 

orf 128 .pep ILFLILTASSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGK 150 

M 11 I I II I I I I I : M I I I I I I I 1 I M I I 1 I M I I M I r | M M I I I I I I I I t 1 M I I I 
orfl28ng ILFLILTASSFLPAGFYTDILNQPNTYYLSTLRFPELLVGSLLAVYGQTQNGRRQTENGK 232 

. orf 128. pep RQLLSSLCFGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPI 210 
MM) II ! N I I M II II II I: II I I I : II I II II M II N I N II I I I I II N II I II 
orfl28ng RQLLSLLCFGALLVCLFVIDKHDPFIPGITLLLPCLLTALLIRSMQYGTLPTRILSASPI 292 
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orfl28.pep 
orfl28ng 



VFVGKI S YS LYLYHWI FI AFAPLIRGGKQLGLPA 
VBVGKISYSLYLYHWIF^ 



244 



352 



The complete length ORF128ng nucleotide sequence <SEQ ID 833> is: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 



ATGCAAGCTG 
CGTGCTATCC 
GATTCCTGGG 
AACATCATTC 
TTATACCCGC 
CCCTGGCTTC 
CAAATGAGGA 
TTTGGGGTTC 
TACTGCATAT 
CCTCTTTTGC 
GCGTAATATC 
TGCCGGCCGG 
CTTTCGACAC 
TTACGGGCAA 
AGTTGCTTTC 
ATCGACAAAC 
CCTGCTGACG 
CCCGCATCCT 
TCCCTATACC 
AGGCGACAAA 
CGGCCGGATT 
AAACGGAAGA 
GTCCCTGATG 
AGGAACACCT 
TTTCCGGAAA 
GGGGTTTCTG 
TGTCCCTCGA 
AACCCGTTGT 
TTTCATTGCC 
GATTTGAAGC 
GAAACCGTCA 
CAATACATCA 
TTGCTATAAA 
AAGAGCAATC 
TTGGGTGGAC 
GACGCTATCT 
TATATGGGGC 
AGGCGGCGCA 



TCCGATACAG 
GTCATTATTT 
GGTGGACATT 
TTTCTGAAAT 
AGGATTAAGC 
GGTGATTGCT 
AAACCATAGA 
CGATTGGGGT 
CTGGTCTTTG 
TGATATTCTG 
AGCATGATCC 
GTTTTATACC 
TGAGGTTTCC 
ACGCAAAACG 
ATTACTCTGT 
ACGATCCGTT 
GCGCTGCTTA 
GTCGGCAAGC 
TGTACCATTG 
CAGCTCGGAC 
TTCCCTGTTG 
TGACCTTCAA 
CTTGTCGGTT 
CCGCCCGCTG 
CCGTCTTGAC 
GATTATGTCG 
TTCGGAGTGT 
GCCGAAAATA 
CAATTCTATG 
GCAATCCTTC 
AGAGGATAGC 
ATCAGCCGTT 
CCAATACCTC 
AGGCGGTCTT 
GCACAAAAAT 
TTACGGCGAC 
GGGAATTTCA 
TTGCAGTAG 



GCCTGAAATT 
TCCACCTGAA 
TTCTTTGTCA 
ACAGAACGGT 
GGATTTATCC 
TCTCAAATCT 
GCTTTCTACG 
ATTTCGATTT 
GCGGTAGAGG 
TTACAAAAAA 
TGTTTCTGAT 
GACATCCTCA 
CGAGCTGTTG 
GCAGACGGCA 
TTCGGCGCat 
TATCCCGGGA 
TCCGGAGTAT 
CCCATCGTAT 
GATTTTTATT 
TGCCTGCCGT 
AGCTATTATT 
AAAGGCATTT 
ACAACCTGTA 
CCCGGCACGC 
CCTCGGCGAC 
GCGGCAGGGA 
TTGGTTTGGG 
CCGGGATGAA 
ATTTGAGGAT 
CTGATACCCG 
CGCCGTCAAA 
CTCCCTTGAG 
CGGCCTATTC 
TGATTTGGTT 
ACCTGCCCAA 
CAAGACCACC 
CAAACACGAA 



GACGGATTGC 
TAACCGCTGG 
TCTCGGGATT 
TCTTTTTCTT 
TGCTTTTATT 
TCCTTTACGA 
GTTTTTTTGT 
GAGTGCCGAC 
AACAGTATTA 
ACCAAATCAC 
TTTGACCGCA 
ACCAACCcaa 
GTGGGTTCGC 
AACAGAAAAT 
tgCTTGTCTG 
ATAACCCTGC 
GCAATACGGG 
TTGTCGGCAA 
GCCTTCGCCC 
ATCGGCGGTT 
TGATTGAACA 
TTCTGCCTTT 
TTCAAGAGGG 
CCGTTGCTGC 
TCGCACGCCG 
AGGGTGGAAA 
TGGATGAGAA 
GTTGAAAAAG 
GGGCGGCCAG 
GGTTCAAAGC 
CCTGTATATG 
GGAGGAAAAA 
GGGCTATGGG 
AAAGATATTC 
AAACACGGTC 
TGACCTATTT 
CGCCTGCTCA 



GGGCCGTCGC 
CTGCCCGGAG 
CCTCATTACC 
TCCGGGATTT 
GCGGCCGTGT 
AGATTTCAAC 
CCAATATTTA 
GAGAACCCCG 
CCTCCTGTAT 
TACGGGTGCT 
TCATCGTTTT 
TACTTATTAC 
TGTTGGCGGT 
GGAAAACGGC 
CCTGTTCGTG 
TCCTTCCCTG 
ACACTTCCGA 
AATCTCTTAT 
ATTACATTAC 
GCCGCGTTGA 
GCCGCTTAGA 
ATCTCGCCCC 
ATATTGAAAC 
GGAAAATAAT 
GACACCTGCG 
GCTAAAATCC 
GCTGGCAGAC 
CCGAAGCTGT 
CCCGTGCCGA 
CCGATTCAGG 
TTTTTGCAAA 
TTGAAAAGAT 
CGACATCGGC 
CCAATGTGCA 
GAAATACACG 
CGGTTCTTAT 
AGCATTCCCG 



This encodes a protein having amino acid sequence <SEQ ID 834>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 



MQAVRYRPEI_ 

NIIL SEIQNG 

QMRKTIELST 

PLLLIFCYKK 

LSTLRFPELL 

IDKHDP FIPG 

SLYLYHWIFI 

KRKMTFKKAF 

FPETVLTLGD 

NPLCRKYRDE 

ETVKRIAAVK 

KSNQAVFDLV 

YMGREFHKHE 



DGLR AVAVLS VIIFHL NNRW LPGGFLG VPI FFVISGFLIT 
— ~ - AAVSLASVIA SQIFLYEDFN 



SFSFRDFYTR 
VFLSNIYLGF 
TKSLRVLRNI_ 
VGSLLAVYGQ 
ITLLLPCLLT 



RIKRIYPAFI 
RLGYFDLSAD 
SIILFLILTA 



ENPVLHIWSL 
SSFLPAGFYT 



TQNGRRQTEN 
ALLIRSMQYG 



GKRQ LLSLLC 



AVEEQYYLLY 
DILNQPNTYY 
FGALLVCLFV 



AFAHYITGDK QLG LPAVSAV 
FCLYLAPSLM 
SHAGHLRGFL 
VEKAEAVFIA 
PVYVFANNTS 
KDIPNVHWVD 
RLLKHSRGGA 



TLPTRILSAS 
AALTAGFSLL 



LVGYNLYSRG 
DYVGGREGWK 
QFYDLRMGGQ 
ISRSPLREEK 
AQKYLPKNTV 
LQ* 



ILKQEHLRPL 
AKILSLDSEC 
PVPRFEAQSF 
LKRFAINQYL 
EIHGRYLYGD 



PIVFVGKISY 
SYYLIEQPLR 
PGTPVAAENN 
LVWVDEKLAD 
LIPGFKARFR 
RPIRAMGDIG 
QDHLTYFGSY 



ORF128ng and ORF128-1 show 95.7% identity in 622 aa overlap: 



orf 128-1. pep 
orfl28ng 
orf 128-1. pep 
orfl28ng 



MQAVRYRPEI DGLRAVAVLSVMIFHLNNRWLPGGFLGVDIFFVISGFLITGI I LSEIQNG 
MQAVRYRPEIDGL^^ 

SFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTVELSAVFLSNIYLGF 
1 1 1 M I ! 1 M n M H 1 1 1 1 1 1 1 H I I II 1 1 1 M M I 

SFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTIELSTVFLSNIYLGF 
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orf 128-1 .pep 
orfl28ng 
orfl28-l.pep 
orfl28ng 
orf 128-1. pep 
orfl28ng 
orf 128-1. pep 
orfl28ng 
orf 128-1. pep 
orfl28ng 
orf 128-1. pep 
orf 128ng 



QQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISIILFLILTA 
: t I I J t I t I I f I I 1 I I t I I I I I 1 I I t i I I I I I I t I 1 I I I I ! I I I I I I I I II I I I 1 I I I 
RLG Y FDLS ADEN PVLH I WSLAVEEQYYLLYPLLL I FCYKKTKSLRVLRN I S 1 1 LFLI LTA 

SSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGKRQLLSSLC 
I III 1:1 I Mill ! I I Ml t I I Mill I II: I II I I I I I I I I t I II I I llllllil II 
S SFLPAGFYT DI LNQPNTYYLSTLRFPELLVG SLLAVYGQTQNGRRQTENGKRQLLSLLC 

FGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPIVFVGKISY 
M I I 1 : 1 1 I M M I : I M M : M I I M I M M I M M I I 1 I M M I M M M M I M M I 
FGALLVCLFVIDKHDPFIPGITLLLPCLLTALLIRSMQYGTLPTRILSASPIVFVGKISY 

SLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKRKMTFKKAF 
I I M I 1 1 I I I I II I I I M I f I I I J I 1 1 1 1 1 I 1 1 1 11 I 1 I M J I I i I 1 I I 1 1 f 1 1 1 I [ I I I 
S LYLYHW I FIAFAHYITGDKQLGLPAVSAVAALTAG FS LLS YYLI EQPLRKRKMTFKKAF 

FCLYLAPSLILVGYNLYARGILKQEHLRPLPGAPLAAENHFPETVLTLGDSHAGHLRGFL 
I I M M I I 1 : 1 II I I I I : I 1 I I M I I I I II I h I : I II I : I I I II I M I M I I 11 I II I I 
FCLYLAPSLMLVGYNLYSRGILKQEHLRPLPGTPVAAENNFPETVLTLGDSHAGHLRGFL 

DYVGSREGWKAKILSLDSECLVWVDEKLADNPLCRKYRDEVEKAEAVFIAQETDLRMGGQ 
lllhlll III MIIIMMII I Mill Ml III III I II Mill III I Ml II MM I I 
DYVGGREGWKAKILSLDSECLVWVDEKLADNPLCRKYRDEVEKAEAVFIAQFYDLRMGGQ 



orf 128-1 . pep PVPRFEAQSFLIPGFPARFRETVKRIAAVKPVYVFANNTSISRSPLREEKLKRFAANQYL 
I I M I II II II I M I I I I M I II I I I M M M I I I II I Ml I I II II I I M I I I MM 
or f 1 2 8ng PVPRFEAQS FLI PGFKARFRETVKRIAAVKPVYVFANNTS I SRS PLREEKLKRFAINQYL 



orfi28-l.pep 
orfl28ng 
orf 128-1. pep 
orfl28ng 



RPIQAMGDIGKSNQAVFDLIKDIPNVHWVDAQKYLPKNTVEIYGRYLYGDQDHLTYFGSY 
I M : I I I II II II II M M M i II II I I M M II 1 I 1 I II I I M I M M I II II I i M I I 
RPIRAMGDIGKSNQAVFDLVKDIPNVHWVDAQKYLPKNTVEIHGRYLYGDQDHLTYFGSY 

YMGREFHKHERLLKSSHGGALQX 
I I II M I I I II I M I : M I M I 
YMGREFHKHERLLKHSRGGALQX 
610 620 



40 



45 



In addition, ORF218ng shows homology to a hypothetical H.influenzae protein: 

sp|P43993|Y392_HAEIN HYPOTHETICAL PROTEIN HI0392 >gi | 1074385 Ipir M B64007 
hypothetical protein HI0392 - Haemophilus influenzae (strain Rd KW20) 
>gi 1 1573364 (U32723) H. influenzae predicted coding region HI0392 (Haemophilus 
influenzae] Length =245 
Score = 239 bits (604), Expect = 3e-62 

Identities - 124/225 (55%), Positives = 152/225 (67%), Gaps = 1/225 (0%) 

VDIFFVISGFLITNIILSEIQNGSFSFRDFYTRRIKRIYPXXXXXXXXXXXXXXXXFLYE 97 
+DIFFVISGFLIT II++EIQ SFS + FYTRRIKRIYP F+Y 
MDI FFVISGFLITGI IITEIQQNSFSLKQFYTRRIKRI YPAFITVMALVSFIASAI FIYN 60 

DFNQMRKTIELSWFLSNIYLGFRLGYFDLSADENPVXHIWSLAVEEQXXXXXXXXXIFC 157 
DFN++RKTIEL+ FLSN YLG GYFDLSA+ENPVLHIWSLAVE Q I 
DFNKLRKTIELAIAFLSNFYLGLTQGYFDLSANENPVLHIWSLAVEGQYYLIFPLILILA 12 0 

YKKTKSLRVLRNISIILFLILTASSFLPAGFYTDILNQPNTYYLSTLRFPELLVGSLLAV 217 
YKK + ++VL I++ILF IL A+SF+ A FY ++L+QPN YYLS LRFPE LLVGS LLA+ 





Query: 


38 




Sbjct: 


1 


50 


Query: 


98 




Sbjct: 


61 


55 


Query: 


158 






Sbjct: 


121 




Query: 


218 


60 


Sb j ct : 


181 



N + Q 



+L++L 



L CLF+++ + FIPGIT 



This analysis, including the identification of several putative transmembrane domains, suggests that 
these proteins from Kmeningitidis and ^.gonorrhoeae^ and their epitopes, could be useful antigens 
for vaccines or diagnostics, or for raising antibodies. 
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Example 99 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 835>: 

1 ATTATTTACG AATACCGCTG GATGTTTCTT TACGGCGCAC TGACGACCTT 

51 " 'ggggctgacg GTCGTGGCAA C.GCGGGCGG TTCGGTATTG GGTCTGTTGT 

101 TGGCGTTGGC GCGCCTGATT CACTTGGAAA AAGCCGGTGC GCCGATGCGC 

151 GTGCTGGCGT GGGCGTTGCG TAAAGTTTCG CTGCTGTATG TTACGCTGTT 

201 CCGGGGTACG CCGCTGTTTG TGCAGATTGT GATTTGGGCG TATGTGTGGT 

251 TTCCGTTTTT CGTC. . 

This corresponds to the amino acid sequence <SEQ ID 836; ORF129>: 

1 IimRWMFL YGALTTLGLT WAXAGGSVL GLLLALARLI HLEKAGAPMR 
51 "VLAWALRKVS LLYVTLFRGT PLFVQIVIWA YVWFPFFV.. 



Further work revealed the complete nucleotide sequence <SEQ ID 837>: 

1 ATGGATTTTC GTTTTGACAT TATTTACGAA TACCGCTGGA TGTTTCTTTA 

51 CGGCGCACTG ACGACCTTGG GGCTGACGGT CGTGGCAACG GCGGGCGGTT 

10 , CGGTATTGGG TCTGTTGTTG GCGTTGGCGC GCCTGATTCA CTTGGAAAAA 

13 i5i GCCGGTGCGC CGATGCGCGT GCTGGCGTGG GCGTTGCGTA AAGTTTCGCT 

201 GCTGTATGTT ACGCTGTTCC GGGGTACGCC GCTGTTTGTG CAGATTGTGA 

251 TTTGGGCGTA TGTGTGGTTT CCGTTTTTCG TCCATCCTTC AGACGGCATT 

301 TTGGTCAGCG GCGAGGCGGC AATCGCGCTG CGTCGCGGAT ACGGGCCGCT 

on 351 GATTGCCGGT TCTTTGGCAC TGATCGCCAA CTCGGGGGCG TATATCTGTG 

2U lei AGATTTTCCG CGCGGGCATC CAGTCTATAG ACAAAGGACA GATGGAGGCG 

451 GCGCGTTCTT TGGGGCTGAC CTATCCGCAG GCGATGCGCT ATGTGATTCT 

501 GCCGCAGGCA TTGCGCCGCA TGCTGCCGCC TTTGGCGAGC GAGTTCATCA j 

551 CGCTCTTGAA AGACAGCTCG CTGCTGTCGG TCATTGCTGT GGCGGAGTTG 

~ S 601 GCGTATGTTC AGAATACGAT TACGGGCCGG TATTCGGTTT ATGAAGAACC 

Z0 gel GCTTTACACC GTCGCCCTGA TTTATCTGTT GATGACGACT TTCTTAGGCT 

701 GGATATTCCT GCGTTTGGAA AAACGTTACA ATCCGCAACA CCGCTGA 

This corresponds to the amino acid sequence <SEQ ID 838; ORF129-l>: 

1 MDFRFD II Y£ YRWMFLYGAL TTT,RT.TWAT AGGSVLGLLL ALARLIHLEK 

-in 51 AGAPMRVLAW ALRKVSLLYV TT.raKTPT.FV OIVIWAYVWF PFFVHPSDGI 

iU 101 LVSGEAAIAL pp^YRP T.IAG SLALIANSGA YIC EIFRAGI QSIDKGQMEA 

151 ARSLGLTYPQ AMRYVILPQA LRRMLPPIAS F.FITLLKDSS LLSVIAVAEL 

201 AYVQNTITGR YSVYEEPLYT VALIYLLMTT FLGWIFL RLE KRYNPQHR* 

Computer analysis of this amino acid sequence gave the following results: 
35 Homology with a predicted ORF from N meningitidis (strain A) 

ORF129 shows 98.9% identity over a 88aa overlap with an ORF (ORF129a) from strain A of N. 

meningitidis: 

10 20 30 40 50 

Otf 129 .pep yTVpypiaMirT.v^aT.TTT^T.T WAXAGGSVLGLLLALAR LIHLEKAGAPMRVLAW 



40 



in lllllll || III INMIIUIHIIIIIM M1IIIN IN IIIH M M 
orfl29a M rp.^nTTv P y PM M CT .vr^T.TTT^T WATAGGSVLGLLLAIAR LIHLEKAGAPMRVLAW 
° 10 20 30 40 50 60 



60 70 80 

45 orfl29.pep &T.ptrvsT.T.WTi.FRGTPLFVOIVIWAYVWFPFFV 

orfl29a j^sllml^ 
SO orfl29a SI^IANSGAYICE IFRAGIQSIDKGQftFAARSIXSLTYPQAMRYVILPQAIJUUlLPPIiAS 

The complete length ORF129a nucleotide sequence <SEQ ID 839> is: 

1 ATGGATTTTC GTTTTGACAT TATTTACGAA TACCGCTGGA TGTTTCTTTA 
51 CGGCGCACTG ACGACCTTGG GGCTGACGGT CGTGGCGACG GCGGGCGGTT 



WO 99/24578 



-460- 



PCT/IB98/01665 



101 CGGTATTGGG TCTGTTGTTG GCGTTGGCGC GCCTGATTCA CTTGGAAAAA 

151 GCCGGTGCGC CGATGCGCGT GCTGGCGTGG GCGTTGCGTA AGGTTTCGCT 

201 GCTGTATGTT ACGCTGTTCC GGGGTACGCC GCTGTTTGTG CAGATTGTGA 

251 TTTGGGCGTA TGTGTGGTTT CCGTTTTTCG TCCATCCTTC AGACGGCATT 

5 301 TTGGTTAGCG GCGAGGCGGC AATCGCGCTG CGTCGCGGAT ACGGGCCGCT 

351 GATTGCCGGT TCTTTGGCAC TGATCGCCAA CTCGGGGGCG TATATCTGTG 

401 AGATTTTCCG CGCGGGCATC CAGTCTATAG ACAAAGGACA GATGGAGGCG 

451 GCGCGTTCTT TGGGGCTGAC CTATCCGCAG GCGATGCGCT ATGTGATTCT 

501 GCCGCAGGCA TTGCGCCGTA TGCTGCCGCC TTTGGCGAGC GAGTTCATCA 

10 '551 CGCTCTTGAA AGACAGCTCG CTGCTGTCGG TCATTGCTGT GGCGGAGTTG 

601 GCGTATGTTC AGAATACGAT TACGGGCCGG TATTCGGTTT ATGAAGAACC 

651 GCTTTACACC GTCGCCCTGA TTTATCTGTT GATGACGACT TTCTTAGGCT 

701 GGATATTCCT GCGTTTGGAA AAACGTTACA ATCCGCAACA CCGCTGA 

This encodes a protein having amino acid sequence <SEQ ID 840>: 

15 1 MDFRFDIIYE YRWMFLYGAL TTLGLT WAT AGGSVLGLLL ALA RLIHLEK 

51 AGAPMRVLAW ALRKVSLLYV TLFRGTP LFV QIVIWAYVWF PFFV HPSDGI 

101 LVSGEAAIAL RRGYGP LIAG SLALIANSGA YIC SIFRAGI QSIDKGQMEA 

151 ARSLGLTYPQ AMRYVILPQA LRRMLPPLAS E FITLLKDSS LLSVIAVAE L 

201 AYVQNTITGR YSVYEEPLYT VALIYLLMTT FLGWIFL RLE KRYNPQHR* 

20 ORF129a and ORF129-1 show 100.0% identity in 248 aa overlap: 

or f 12 9a . pep MDFRFDI I YE YRWMFLYGALTTLGLTWATAGGSVLGLLLALARL I HLEKAGAPMRVLAW 
I I I I I I t I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I ! I I I I I I I I I I I I I I II I I 
or f 12 9-1 MDFRFDI IYEYRWMFLYGALTTLGLTVVATAGGSVLGLLLALARLIHLEKAGAPMRVLAW 

25 orf 129a . pep ALRKVSLLYVTLFRGTPLFVQIVIWAYWFPFFVHPSDGILVSGEAAIALRRGYGPLIAG 

M 1 1 II 1 1 1 1 1 I I 1 1 1 1 ! I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 I I 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 
orf 129-1 ALRKVSLLYVTLFRGTPLFVQXVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAG 

orf 129a. pep SLALIANSGAYICEIFRAGIQSIDKGQMEAARSLGLTYPQAMRYVILPQALRRMLPPLAS 
30 II | || | | | | | M I | I | | | M | | | I | | | | | | | | t | || | | \ 1 | | | | | | | | | | | | | | | | M I I 

orf 129-1 S LAL IAN SGAYI CE I FRAGI QS I DKGQMEAARS LGLT YPQAMRYVI LPQALRRMLP PLAS 

orf 129a. pep EFITLLKDSSLLSVIAVAELAYVQNTITGRYSVYEEPLYTVALIYLLMTTFLGWIFLRLE 
I IMI II II II I III Mill III II I II Ml III llllll I II III I Ml Mil II Mi I 
35 orf 129-1 EFITLLKDSSLLSVIAVAELAYVQNTITGRYSVYEEPLYTVALIYLLMTTFLGWIFLRLE 

orf 129a. pep KRYNPQHRX 
I II I I I II I 
orf 12 9-1 KRYNPQHRX 

40 

Homology with a predicted ORF from N. gonorrhoeae 

ORF129 shows 98.9% identity over a 88 aa overlap with a predicted ORF (ORF129ng) from 
N. gonorrhoeae: 

orf 12 9 . pep I IYEYRWMFLYGALTTLGLTWAXAGGSVLGLLLALARLIHLEKAGAPMRVLAW 54 

45 || || || I || | || M I I II II II I : I I M II II II I I I II II I II I I I I I II I II 

orfl29ng MDFRFDI I YE YRWMFLYGALTTLGLTVVATAGGSVLGLLLALARLIHLEKAGAPMRVLAW 60 

orf 129 .pep ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFV 88 
I II I I I II II II M M I I II II II I II I II I I I I 
50 orfl29ng ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFVILHTAFLGNAMRQSRRVPDKGRWIAG 120 

An ORF129ng nucleotide sequence <SEQ ID 841> was predicted to encode a protein having amino 
acid sequence <SEQ ID 842>: 

1 MDFRFDIIYE YRWMFLYGAL TTLGLT WAT AGGSVLGLLL ALAR LIHLEK 

51 AGAPMRVLAW ALRKVSLLYV TLFRGTPL FV QIVIWAYVWF PFFVILH TAF 

55 101 LGNAMRQSRR VPDKGRWIAG SLELNCQPRG RKTRGEFPPG ESNLGTEPRN 

151 PLSMGQRRFP GCENWYPPQN FIKK* 

Further work revealed the following gonococcal sequence <SEQ ID 843>: 



1 ATGGATTTTc gtTTTGACAT TATTTAcgaA TACCGCTGGA TGTTTCTTTA 



WO 99/24578 



-461- 



PCT/IB98/01665 



51 CGGCGCACTG Acgaccttgg ggctgacggt cgtggcgacg gCGGGCGGTT 

101 CGGtattggG TCTGTTGTTG GCGTTGGCGC GCCTGATTCA CTTGGAAAAA 

151 GCCGGTGCGC CGATGCGCGT GCTGGCGTGG GCGTTGCGTA AGGTTTCGCT 

201 GCTGTACGTT ACCCTGTTCC GGGGTACGCC GCTGTTTGTG CAGATTGTGA 

251 TTTGGGCGTA TGTGTGGTTT CCGTTTTTCG TCCATCCTTC AGACGGCATT 

301 TTGGTCAGCG GCGAGGCGGC AATCGCGCTG CGTCGCGGAT ACGGGCCGCT 

351 GATTGCCGGT TCTTTGGCAC TGATCGCCAA CTCGGGGGCG TATATCTGTG 

401 AGATTTTCCG CGCGGGCATC CAGTCTATAG ACAAAGGACA GATGGAGGCG 

451 GCGTGTTCTT TGGGACTGAC CTATCCGCAG GCGATGCGCT ATGTGATTCT 

501 GCCGCAGGCA TTGCGCCGTA TGCTGCCGCC TTTGGCGAGC GAGTTCATCA 

551 CGCTCTTGAA AGACAGCTCG CTGCTGTCGG TCATTGCTGT GGCGGAGTTG 

601 GCGTATGTTC AGAATACGAT TACGGGCCGG TATTCGGTTT ATGAAGAACC 

651 GCTTTACACC GCCGCCCTGA TTTATCTGTT GATGACGACT TTCTTAGGCT 

701 GGATATTCCT GCGTTTGGAA AAACGTTACA ATCCGCAACA CCGCTGA 

This corresponds to the amino acid sequence <SEQ ID 844; ORF129ng-l>: 

1 MDFRFDIIYE YRWMFLYGAL TTLGLT WAT AGGSVLGLLL ALA RLIHLEK 

51 AGAPMRVLAW ALRKVSLLYV TLFRGTP LFV QIVIWAYVWF PFFV HPSPGI 

101 LVSGEAAIAL RRGYGP LIAG SLALIANSGA YIC EIFRAGI QSIDKGQMEA 

151 ARSLGLTYPQ AMRYVILPQA LRRMLPPLAS E FITLLKDSS LLSVIAVAE L 

201 AYVQNTITGR YSVYEEPLYT VALIYLLMTT FLGWIF LRLE KRYNPQHR* 

ORF129ng-l and ORF129-1 show 99.2% identity in 248 aa overlap: 



orf 129-1 .pep MDFRFDIIYE YRWMFLYGALTTLGLTWATAGGSVLGLLLALARLIHLEKAGAPMRVLAW 
I I I I I I II I I I I II II I I I I I I I II I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I 
orfl29ng-l MD FRFD 1 1 YE YRWMFLYG ALTTLGLT WAT AGG S VLGLLLALARLI HLEKAGAPMRVLAW 

orf 129-1 .pep ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAG 

I I I I I I II M I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I II I I I I I I I I I I I I I II j 
or f 12 9ng-l ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAG j 

orf 129-1 .pep SLALIANSGAYICEIFRAGIQSIDKGQMEAARSLGLTYPQAMRYVILPQALRRMLPPLAS 1 

I! I M I MMI IIMMMMI MM! M I I I I I I I I I M II M I I I II I I I I I I I I I I 
orfl29ng-l SLALIANSGAYICEIFRAGIQSIDKGQMEAACSLGLTYPQAMRYVILPQALRRMLPPLAS 

orf 129-1 .pep EFITLLKDSSLLSVIAVAELAYVQNTITGRYSVYEEPLYTVALIYLLMTTFLGWIFLRLE 
I I I II I I I I I I I II I I I I I I I I I I I II I I I I I I I I II II I : II M II I I II I I I M M I I 
orfl29ng-l E FI TLLKDS S LL S VI AVAELAYVQNT I TGRYS V YEE PLYT AALI YLLMTT FLGW I FLRLE 

orf 129-1. pep KRYNPQHRX 

itinim 

orfl29ng-l KRYNPQHRX 

In addition, ORF129ng-l is homologous to an ABC transporter from A.fulgidus: 

2650409 (AEOO1090) glutamine ABC transporter, permease protein (glnP) 
[Archaeoglobus fulgidus] Length = 224 
Score = 132 bits (329), Expect » 2e-30 

Identities - 86/178 (48%), Positives = 103/178 (57%), Gaps - 18/178 (10%) 

Query: 65 VSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAGSLAL 124 

+S YV + RGTPL VQI+I +F P+ GI + E A G +AL 

Sbjct: 58 ISTAYVEVIRGTPLLVQILI VYFGLPAIGINLQPEPA GIIAL 99 

Query: 125 IANSGAYICEIFRAGIQSIDKGQMEAACSLGLTYPQAMRYVILPQALRRMLPPLASEFIT 184 

SGAYI EI RAGI+SI GQMEAA SLG+TY QAMRYVI PQA R +LP L +EFI 
Sbjct: 100 S ICSGAY I AE I VRAG I E S I P IGQME AARS LGMT YLQAMRYVI FPQAFRN I LPALGNE FIA 159 

Query: 185 LLKDSSLLSVIAVAELAYVQNTITGRYSVYEEPLYTAALIYLLMTTFLGWIFLRLEKR 242 

LLKDSSLLSVI ++ EL V I P AL YL+MT L + +K+ 

Sbjct: 160 LLKDSSLLSV I S I VELTRVGRQI VNTT FNAWT PFLGVALFYLMMT I PLS RLVAYSQKK 217 

This analysis, including the identification of transmembrane domains in the two proteins, suggests 
that the proteins from Kmeningitidis and N.gonorrhoeae, and their epitopes, could be useful 
antigens for vaccines or diagnostics, or for raising antibodies. 
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Example 100 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 845>: 

1 . . CTGAAAGAAT GCCGTCTGAA AGACCCTGTT TTTATTCCAA ATATCGTTTA 
51 TAAGAACATC GCCATTACTT TCCTGCTCTT GCACGCCGCC GCCGAACTTT 
101 GGCTGCCCGC GCAAACCGCC GGTTTTACCG CGCTCGCCGT CGGCTTCATC 
,151 CTGCTCGCCA AGCTGCGTGA gCTTCACCAT CACGAACTCT TACGTAAACA 
201 cTACGTCCGC ACTTATTACy TGCTCCAACT CTTTGCCGCC GCAGgcTAgT 
251 TTGTGGACAG GCGCGGCGwA ATTACAAAAC CTGCCCGCyT CCGCGCCCCT 
301 GCACCTGATT ACCCTCGGCG GCATGATGGG CGGCGTGATG ATGGTGTGGc 
351 TGACCGCCGG ACTGTGGCAC AGCGGCTTTA CCAAACTCGA CTACCCCAAA 
401 CTCTGCCGCA TTGCCGTCCC CATCCTTTTC GCCGCCGCCG TCTCGCGCGC 
451 TTTCTTGrTG AACGTGAACC CGrTATTTTT CATTACCGTT CCTGCGATTC 
501 TGACCGCCGC CGTATTCGTA CTGTATCTTT TCrCGTTTAT ACCGATATTT 
551 CGGGCGAATG CGTTTACAGA CGATCCGGAr TAr 

This corresponds to the amino acid sequence <SEQ ID 846; ORF130>: 

1 . . ZKECRLKDPV FIPNIVYKNI AITFLLLHAA AELWLPAQTA GFTALAVGFI 
51 LLAKLRELHH HELLRKHYVR TYYLLQLFAA AGSLWTGAAX LQNLPASAPL 
101 HLITLGGMMG GVMMVWLTAG LWHSGFTKLD YPKLCRIAVP ILFAAAVSRA 
151 FLXNVNPXFF ITVPAILTAA VFVLYLFXFI PIFRANAFTD DPE* 

Further work revealed the complete nucleotide sequence <SEQ ID 847>: 

1 ATGCGGCCGT TTTTCGTCGG CGCGGCGGTG CTTGCCATAC TCGGTGCGCT 

51 GGTGTTTTTC ATCAACCCCG GTGCCATCGT CCTGCACCGC CAAATTTTCT 

101 TGGAACTTAT GCTGCCGGCG GCATACGGCG GTTTTTTGAC TGCGGCTTTG 

151 TTGGACTGGA CGGGTTTTTC GGGTAACCTG AAACCTGTCG CGACTTTGAT 

201 GGCGGCATTA TTGCTCGCCG CATCCGCTAT ACTGCCCTTT TCGCCGCAAA 

251 CTGCCTCGTT TTTCGTCGCC GCCTATTGGC TGGTGTTGCT GCTGTTCTGC 

301 GCCCGGCTGA TTTGGCTAGA CCGAAACACC GACAACTTCG CCCTGCTAAT 

351 GTTACTTGCC GCGTTCACTG TTTTTCAGAC GGCATATGCC GTCAGCGGCG 

401 ATTTGAACCT GTTGCGCGCG CAAGTGCATC TAAATATGGC GGCGGTGATG 

4 51 TTCGTATCCG TGCGCGTCAG TATTCTTTTG GGCGCGGAAG CCCTGAAAGA 

501 ATGCCGTCTG AAAGACCCTG TTTTTATTCC AAATATCGTT TATAAAAACA 

551 TCGCCATTAC TTTCCTGCTC TTGCACGCCG CCGCCGAACT TTGGCTGCCC 

601 GCGCAAACCG CCGGTTTTAC CGCGCTCGCC GTCGGCTTCA TCCTGCTCGC 

651 CAAGCTGCGT GAGCTTCACC ATCACGAACT CTTACGTAAA CACTACGTCC 

7 01 GCACTTATTA CCTGCTCCAA CTCTTTGCCG CCGCAGGCTA TTTGTGGACA 

751 GGCGCGGCGA AATTACAAAA CCTGCCCGCC TCCGCGCCCC TGCACCTGAT 

801 TACCCTCGGC GGCATGATGG GCGGCGTGAT GATGGTGTGG CTGACCGCCG 

851 GACTGTGGCA CAGCGGCTTT ACCAAACTCG ACTACCCCAA ACTCTGCCGC 

901 ATTGCCGTCC CCATCCTTTT CGCCGCCGCC GTCTCGCGCG CTTTCTTGAT 

951 GAACGTGAAC CCGATATTTT TCATTACCGT TCCTGCGATT CTGACCGCCG 

1001 CCGTATTCGT ACTGTATCTT TTCACGTTTA TACCGATATT TCGGGCGAAT 

1051 GCGTTTACAG ACGATCCGGA ATAA 

This corresponds to the amino acid sequence <SEQ ID 848; ORF130-1>: 

1 MRPFFVGAAV LAILGALVFF INPGAIVLHR QIFLELMLPA AYGGFLTA AL 

51 LDWTGFSGNL KP VATLMAAL LLAASAILP F SPQT ASFFVA AYWLVLLLFC 

101 ARLIWLDRNT DNF ALLMLLA AFTVFQTAYA V SGDLNLLRA QVHLN MAAVM 

151 FVSVRVSILL GAEALK£CRL KDPVFIPNIV YKN IAITFLL LHAAAELWLP 

201 AQ TAGFTALA VGFILLAKLR ELHHHELLRK HYVRTYYLLQ LFAAAGYLWT 

251 GAAKLQNLPA SAPLH LITLG GMMGGVMMVW LT AGLWHSGF TKLDYPKLCR 
301 IAVPILFAAA VSRAFLMN VN P IFFITVPAI LTAAVFVL YL FT FI PI FRAN 

351 AFTDDPE* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORP from N. meningitidis (strain A) 

ORF130 shows 943% identity over a 193aa overlap with an ORF (ORF130a) from strain A of K 
meningitidis: 
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10 20 30 

orf 130 .pep LKECRLKDPVFIPNIVYKNIAITFLLLHAA 

1 M 1 I i I I I I I I I I : I I I I I I I I I I I I I I I 
orf 130a LNLLRAQVHLNMAAVMFVSVRVS I LLGAEALKECRLKDPVFI PNWYKN IAIT FLLLHAA 

140 150 160 170 180 190 

40 50 60 70 80 90 

orf 130 . pep AELWLPAQTAGFTALAVGFILLAKLRELHHHELLRKHYVRTYYLLQLFAAAGSLWTGAAX 

I ! I 1 I I I I I I I I I : I f I ! I I I I i 1 I I I I I 1 I I t I t I I 1 I I t t I 1 I I I I I i 1 I MUM 
orf 130a AELWLPAQTAGFTSLAVGFILLAKLRELHHHELLRKHYVRTYYLLQLFAAAGYLWTGAAK 
200 210 220 230 240 250 



100 110 120 130 140 150 

orf 130 . pep LQNLPASAPLHLITLGGMMGGVMMVWLTAGLWHSGFTKLDYPKLCRIAVPILFAAAVSRA 
III II I II III I ! Ml I I 11:1 I I I I I I I II I I I I I I I I M I I II III M I I I I Mill I 
orf 130a LQNLPASAPLHLITLGGMMGSVMMVWLTAGLWHSGFTKLDYPKLCRIAVPILFAAAVSRA 
260 270 280 290 300 310 



160 170 180 190 

orf 130 .pep FLXNVNPXFFITVPAILTAAVFVLYLFXFIPIFRANAFTDDPEX 
I MM I i I M I M M 1 1 I M I M : : I : M M I I M I M I I I 
orf 130a VLMNVN PI FFIT VPAI LTAAVFVLYLLT FVPI FRANAFT DDPEX 

320 330 340 350 

The complete length ORF 130a nucleotide sequence <SEQ ID 849> is: 

1 ATGCGGCCGT TTTTCGTCGG CGCGGCGGTG CTTGCCATAC TCGGTGCGCT 

51 GGTGTTTTTC ATCAACCCCG GTGCCATCGT CCTGCACCGC CAAATTTTCT 

101 TGGAACTTAT GCTGCCGGCG GCATACGGCG GTTTTTTGAC TGCGGCTTTG 

151 TTGGACTGGA CGGGTTTTTC GGGTAACCTG AAACCTGTCG CGACTTTGAT 

201 GGCGGCATTA TTGCTCGCCG CATCCGCTAT ACTGCCCTTT TCGCCGCAAA 

251 CTGCCTCGTT TTTCGTCGCC GCCTATTGGC TGGTGTTGCT GCTGTTCTGC 

301 GCCCGGCTGA TTTGGCTAGA CCGAAACACC GACAACTTCG CCCTGCTAAT 

351 GTTACTTGCC GCGTTCACTG TTTTTCAGAC GGCATATGCC GTCAGCGGCG 

401 ATTTGAACCT GTTGCGCGCG CAAGTGCATC TAAATATGGC GGCGGTGATG 

451 TTCGTATCCG TGCGCGTCAG TATTCTTTTG GGCGCGGAAG CCCTGAAAGA 

501 ATGCCGTCTG AAAGACCCAG TATTCATCCC CAATGTCGTC TATAAAAACA 

551 TCGCCATTAC CTTCCTGCTC CTGCACGCCG CCGCCGAACT TTGGCTGCCT 

601 GCGCAAACCG CCGGTTTTAC CTCGCTCGCC GTCGGCTTTA TCCTGCTTGC 

651 CAAGCTGCGT GAGCTTCACC ATCACGAACT CCTGCGCAAA CACTACGTCC 

701 GCACTTATTA CCTGCTCCAA CTCTTTGCCG CCGCAGGCTA TTTGTGGACA 

751 GGCGCGGCGA AATTACAAAA CCTGCCCGCC TCCGCGCCCC TGCACCTGAT 

801 TACCCTCGGT GGCATGATGG GCAGCGTGAT GATGGTGTGG CTGACTGCCG 

851 GACTGTGGCA CAGCGGCTTT ACCAAGCTCG ACTACCCGAA ACTCTGCCGC 

901 ATCGCCGTCC CCATCCTNTT CGCCGCCGCC GTTTCGCGCG CTGTTTTAAT 

951 GAACGTAAAC CCGATATTCT TCATCACCGT CCCCGCAATT CTGACCGCCG 

1001 CCGTGTTCGT GCTTTACCTG CTGACATTCG TACCGATCTT TCGGGCGAAC 

1051 GCGTTTACAG ACGATCCGGA ATAA 

This encodes a protein having amino acid sequence <SEQ ID 850>: 



1 MRPFFVGAAV LAILGALVFF INPGAIVLHR QIFLELMLPA AYGGFLTAA L 

51 LDWTGFSGNL KP VATLMAAL LLAASAILP F SPQT ASFFVA AYWLVLLLFC 

101 ARLIWLDRNT DNF ALLMLLA AFTVFQTAYA V SGDLNLLRA QVHLN MAAVM 

151 FVSVRVSILL GA EALKECRL KDPVFIPNW YKN IAITFLL LHAAAELWLP 

201 A QTAGFTSLA VGFILLAKL R ELHHHELLRK HYVRTYYLLQ LFAAAGYLWT 

251 GAAKLQNLPA SAPL HLITLG GMMGSVMMVW LTA GLWHSGF TKLDYPKLCR 

301 IAVPILFAAA VSRAVLMN VN P IFFITVPAI LTAAVFVL YL LT FVPI FRAN 

351 AFTDDPE* 

ORF130a and ORF130-1 show 98.3% identity in 357 aa overlap: 



orf 130a . pep MRPFFVGAAVLAILGALVFFINPGAIVLHRQIFLELMLPAAYGGFLTAALLDWTGFSGNL 
I I I I ! M M I I I i M M i M M I I ! H M M I I 1 I i I M M 1 1 I I I I 1 M I M I I I I I M 
orf 130-1 MRPFFVGAAVLAILGALVFFIN PGAIVLHRQI FLELMLPAAYGGFLTAALLDWTGFSGNL 



orf 130a . pep KPVATLMAALLLAASAILPFSPQTASFFVAAYWLVLLLFCARLIWLDRNTDNFALLMLLA 
MM Mill III MMIM M III 1 M Mill M I I I I M I 1 1 1 1 1 I 1 1 ! 1 1 M 1 1 1 1 1 i 
orf 130-1 KPVATLMAALLLAASAILP FS PQTAS FFVAAYWLVLLLFCARLI WLDRNTDN FALLMLLA 



orf 130a. pep 



AFTVFQTAYAVSGDLNLLRAQVHLNMAAVMFVSVRVS ILLGAEALKECRLKDPVFI PNW 
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I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I 
orf 130-1 AFTVFQTAYAVSGDLNLLRAQVHLNMAA™ 

or f 130a . pep YKNIMTFLLLHAAAELWLPAQTAGETSIAVGFILIJUaRELHHHEIJJ^YVRTYYLLQ 
I I I I t I I i I I i I I I 1 I I I I I I f II I I 1 : I I I I I I I I I | I I | | | | | | | | | | | M | | | | | | | 
orf 130-1 YKN IAITFLLLHAAAELWLPAQTAGFTALAVGFI LLAKLREIiHHHELLRICHYVRT YYLLQ' 

or f 1 30a . pep L FAAAGYLWTGAAKLQNLPAS APLHLIT LGGMMG SVMMVWLTAGLWHSGFTKLDYPKLCR 

I M I II MM II Ml III MM II MIMMiit: I II I II I til III III Mil Mill 
orfl'30-1 LFAAAGYLWTGAAKLQNLPASAPLHLITLGGMMGGVMMVWLTAGLWHSGFTKLDYPKLCR 

orf 130a . pep IAVPILFAAAVSRAVLMNVNPIFFITVPAILTAAVFVLYLLTFVPIFRANAFTDDPE 
I II Ml III Mill II MUM III MM II I II I Ml 1:11:11 I II MM MM 
orf 130-1 IAVPILFAAAVSRAFLMNVNPIFFITVPAILTAAVFVLYLFTFIPIFRANAFTDDPE 



Homology with a predicted ORF from A Gonorrhoeae 

ORF130 shows 91.7% identity over a 193 aa overlap with a predicted ORF (ORF130ng) from 
Kgonorrhoeae: 

orf 130 .pep LKECRLKDPVFI PNIVYKNIAITFLLLHAA 30 

MMMMIMIII::IMIIM MUM 
orfl30ng LNLLRAQVHLNMAAVMFVSVRVSVLLGTETLKECRLKDPVFIPNVIYKNIAIT-LLLHAA 201 

orf 130 .pep AELWLPAQTAGFTAIAVGFILLAPCLRELHHHELLRKHYVRTYYLLQLFAAAGSLWTGAAX 90 

I M I M II I II M I I I I I M II I II M I II M ! II I M I I I I II I II II M I MMM 
or f 1 30ng AELWLPAQTAGFTALAVGFILLAKLRELHHHELLRKHYVRTYYLLQLFAAAGYLWTGAAK 2 61 

orf 130 .pep LQNLPASAPLHLITLGGMMGGVMMVWLTAGLWHSGFTKLDYPKLCRIAVPILFAAAVSRA 150 

I I M I II t II I I I 1 i i 1 1 I I I I I 1 I f I I I 1 I I I I I I I I i 1 I I I I 1 I I 1 1111:11111 
o r f 1 3 Ong LQNLPASAPLHL I TLGGMTGGVMMVWLTAGLWHSGFTKLDYPKLCRIAVS I LFAS AVSRA 321 

orf 130 . pep FLXNVNPXFFITVPAILTAAVFVLYLFXFIPIFRANAFTDDPE 193 

I MM MMM IIIMM:II|::|:|IMIIIIIIIII 
orfl30ng VLMNVN P I FFIT VPE I LT AAVFMLYLLT FVP I FRAN AFT D DPE 364 

An ORF130ng nucleotide sequence <SEQ ID 851> was predicted to encode a protein having amino 
acid sequence <SEQ ID 852>: 



1 MNKFFTHPMR PFFVGAA VLA 

51 RRFFDYRFVG PDGFFRQPET 

101 LAGVAAVLRL ADLARRQHRT 

151 H LNMAAVMFV SVRVSVLL GT 

201 AAELWLPA QT AGFTALAVGF 

251 AAGYLWTGAA KLQNLPASAP 

301 DYPKLC RIAV SILFASAVSR 

351 VPIFRANAFT DDPE* 



ILGALVFFHQ 
CRYFDGGWA 



LRSVDVTAAF 
ETLKECRLKD 
ILLAKLRELH 



LHLITLGGMT 
AVLMNVNPIF 



PRRYHPAPPN FLGTYAAGCI 
CCGCFIAVFT ATC RIFRRRL 
TVFQTAYAVS GDLNLLRAQV 
P VFIPNVIYK NIAITLLLH A 
HHELLRKHYV RTYYLLQLFA 
GGVMMVWLTA GLWHSGFTKL 
FITVPEILTA AVFMLYLLTF 



Further work revealed the following gonococcal DNA sequence <SEQ ID 853>: 



1 ATGCGCCCGT TTTTCGTCGG TGCGGCAGTA CTTGCCATAC TCGGTGCGTT 

51 GGTGTTTTTT ATCAACCCCG GCGCTATCAT CCTGCACCGC CAAATTTTCT 

101 TGGAACTTAT GCTGCCGGCT GCATACGGCG GTTTTTTGAC TACCGCTTTG 

151 TTGGACCGGA CGGGTTTTTC AGGCAACCTG AAACCTGCCG CTACTTTGAT 

201 GGCGGTGTTG TTGCTTGTTG CGGCTGTTTT ATTGCCGTTT TTACCGCAAC 

251 TTGCCGCATT TTTCGTCGCC GCCTATTGGC TGGTGTTGCT GCTGTTCTGC 

301 GCCTGGCTGA TTTGGCTCGA CCGCAACACC GACAACTTCG CTCTGTTGAT 

351 GTTACTTGCC GCATTTACCG TTTTTCAGAC GGCCTATGCC GTCAGCGGCG 

401 ATTTGAACTT ACTGCGCGCG CAAGTGCATT TGAATATGGC GGCGGTCATG 

451 TTCGTATCCG TCCGCGTCAG CGTCCTTTTG GGCACGGAAA CCCTGAAAGA 

501 ATGCCGTCTG AAAGACCCCG TATTCATCCC CAACGTTATC TATAAAAACA 

551 TCGCCATCAC CCTGCTGCTG CACGCCGCCG CCGAACTTTG GCTGCCCGCG 

601 CAAACCGCCG GTTTTACTGC GCTTGCCGTC GGCTTCATCC TGCTCGCCAA 

651 GCTGCGCGAA CTGCACCATC ACGAACTCTT ACGCAAACAC TACGTCCGCA 

701 CTTATTACCT GCTCCAGCTC TTTGCCGCCG CAGGTTATCT GTGGACAGGC 

751 GCGGCGAAAC TGCAAAACCT GCCCGCCTCC GCGCCCCTGC ACCTGATTAC 

801 CCTCGGCGGC ATGACGGGTG GCGTGATGAT GGTGTGGCTG ACTGCCGGAC 

851 TGTGGCACAG CGGCTTTACC AAACTCGACT ACCCGAAACT CTGCCGCATC 
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901 GCCGTCTCCA TCCTTTTCGC CTCCGCCGTT TCGCGCGCTG TTTTAATGAA 

951 CGTGAATCCG ATATTCTTCA TCACCGTTCC CGAGATTCTG ACCGCCGCCG 

1001 TGTTCATGCT TTACCTGCTG ACGTTCGTAC CGATTTTTCG AGCGAACGCG 

1051 TTTACAGACG ATCCGGAATA A 

This corresponds to the amino acid sequence <SEQ ID 854; ORF130ng-l>: 

1 MRP FFVGAAV LAILGALVFF I NPGAIILHR QIFLELMLPA AYGGFLTTAL 

51 LDRTGFSGNL KPA ATLMAVL LLVAAVLLPF L P QLAAFFVA AYWLVLLLFC 

101 AWLIWLDRNT DNF ALLMLLA AFTVFQTAYA V SGDLNLLRA QVH LNMAAVM 

151 FVSVRVSVLL GTETLKECRL KDP VFIPNVI YKNIAITLLL HAAAELWLPA 

201 Q TAGFTALAV GFILLAKL RE LHHHELLRKH YVRTYYLLQL FAAAGYLWTG 

251 AAKLQNLPAS APLHLITLGG MTGGVMMVWL TAGLWHSGFT KLDYPKLCRI 

301 AVSILFASAV SRAVLMN VNP IFFITVPE IL TAAVFMLYLL TFVPIFRANA 

351 FTDDPE * 

ORF130ng-l and ORF130-1 show 92.4% identity in 357 aa overlap: 

orf 130-1 . pep MRPFFVGAAVLAILGALVFFINPGAIVLHRQIFLELMLPAAYGGFLTAALLDWTGFSGNL 
I I i I I I I II I I I I I 1 I I I I I I I I I I I : I 1 1 1 I I I I I II I I I I I I I I I : I I I I I I I I I I I 
orfl30ng-l MRPFFVGAAVLAILGALVFFINPGAI ILHRQI FLELMLPAAYGGFLTTALLDRTGFSGNL 

orf 130-1 . pep KPVATLMAALLLAASAILPFSPQTASFFVAAYWLVLLLFCARLIWLDRNTDNFALLMLLA 
Il:|||||:||1:|::: Ml II I : I I I i I i I ! I I I I I I i I II I I I I I I I I I I I I I I 1 
orf 130ng-l KPAATLMAVLLLVAAVLLPFLPQLAAFFVAAYWLVLLLFCAWLIWLDRNTDNFALLMLLA 

orf 130-1 . pep AFTV FQTAYAVSGDLNLLRAQVHLNMAAVMFVSVRVS I LLGAEALKECRLKDPVFI PN IV 
I I 11 I I I I II I I I I I I I i I M I I 1 1 I I I I I I I I 1 II I : i M : I : I I I 1 I I I I I M I hi : : 
or f 1 30ng- 1 AFTVFQTAYAVSGDLNLLRAQVHLNMAAVMFVSVRVSVLLGTETLKECRLKDPVFIPNVI 

orf 130-1 .pep YKNI AI T FLLLHAAAELWLPAQTAGFT ALAVG FI LLAKLRELHHHELLRKHYVRT YYLLQ j 
lllilll III I I II I I MMIMI M II II III II I M II I I II I Mil Mil M Ml I I 
orfl30ng-l YKN I AIT -LLLHAAAELWLPAQTAGFTALAVG FI LLAKLRELHHHELLRKHYVRT YYLLQ | 

orf 130-1 . pep LFAAAGYLWTGAAKLQNLPASAPLHLITLGGMMGGVMMVWLTAGLWHSGFTKLDYPKLCR 
I I I t I t I I I 1 1 I I I t t I I I I I i I 1 1 1 1 I 1 I I I MIIIIIIIIMIIIIIIIMIIMII 
orfl30ng-l LFAAAGYLWTGAAKLQNLPASAPLHLITLGGMTGGVMMVWLTAGLWHSGFTKLDYPKLCR 

orf 130-1. pep I AVP I LFAAAVSRAFLMNVN PIFFITVPAI LTAAVFVLYLFT FI PI FRANAFTDDPEX 
Ml lllhlllll I I M I I I I M I II I I I I I t I r I | 1 : I t - I i i I f I I I I I I I I I 
orfl30ng-l IAVSILFASAVSRAVLMNVNPIFFITVPEILTAAVFMLYLLTFVPIFRANAFTDDPEX 

Based on this analysis, it is predicted that the proteins from N.meningitidis and N.gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 101 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 855>: 

1 ATGGAAATTC GGGCAATAAA ATATACGGCA ATGGCTGCGT TGCTTGCATT 

51 TACGGTTGCA GGCTGCCGGC TGGCGGGGTG GTATGAGTGT TCGTCCCTCA 

101 CCGGCTGGTG TAAGCCGAGA AAACCGGCTG CCATCGATTT TTGGGATATT 

151 GGCGGCGAGA GTCCGCCGTC TTTAGGGGAC TACGAGATAC CGCTTTCAGA 

201 CGGCAATAGT TCCGTCAGGG CAAACGAATA TGAATCCGCA CAACAATCTT 

251 ACTTTTACAG GAAAATAGGG AAGTTTGAAG C.TGCGGGCT GGATTGGCGT 

301 ACGCGTGACG GCAAACCTTT GATTGAGACG TTCAAACAGG GAGGATTTGA 

351 CTGCTTGGAA AAG. . 

This corresponds to the amino acid sequence <SEQ ID 856; ORF131>: 

1 MEIRAIKYTA MAALLAFTVA GCRLAGWYEC SSLTGWCKPR KPAAIDFWDI 
51 GGESPPSLGD YEIPLSDGNS SVRANEYESA QQSYFYRKIG KFEXCGLDWR 
101 TRDGKPLIET FKQGGFDCLE K. . 

Further work revealed the complete nucleotide sequence <SEQ ID 857>: 

1 ATGGAAATTC GGGCAATAAA ATATACGGCA ATGGCTGCGT TGCTTGCATT 
51 TACGGTTGCA GGCTGCCGGC TGGCGGGGTG GTATGAGTGT TCGTCCCTCA 
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101 CCGGCTGGTG TAAGCCGAGA AAACCGGCTG CCATCGATTT TTGGGATATT 

151 GGCGGCGAGA GTCCGCCGTC TTTAGGGGAC TACGAGATAC CGCTTTCAGA 

201 CGGCAATCGT TCCGTCAGGG CAAACGAATA TGAATCCGCA CAACAATCTT 

251 ACTTTTACAG GAAAATAGGG AAGTTTGAAG CCTGCGGGCT GGATTGGCGT 

301 ACGCGTGACG GCAAACCTTT GATTGAGACG TTCAAACAGG . GAGGATTTGA 

351 CTGCTTGGAA AAGCAGGGGT TGCGGCGCAA CGGTCTGTCC GAGCGCGTCC 

401 GATGGTAA 

This corresponds to the amino acid sequence <SEQ ID 858; ORF131-l>: 

1 MEIRAIKYTA MAALLAFTVA G CRLAGWYEC SSLTGWCKPR KPAAIDFWDI 
51 GGESPPSLGD YEIPLSDGNR SVRANEYESA QQSYFYRKIG KFEACGLDWR 
101 TRDGKPLIET FKQGGFDCLE KQGLRRNGLS ERVRW* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.meninzitidis (strain A) 

ORF13 1 shows 95.0% identity over a 121aa overlap with an ORF (ORF131a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 131 . pep MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLTGWCKPRKPAAIDFWDIGGESPPSLGD 
I I I I I I I I I t M I I I I I I I I I I I I | | | | j | | | | : | | | | | | | | | | | | | | | | | | | | || | | | 
orf 131a MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLSGWCKPRKPAAIDFWDIGGESPPSLED 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 131 . pep YE I PLS DGNSSVRANEYESAQQSYFYRKIGKFEXCGLDWRTRDGKPLIETFKQGGFDCLE 

I II III Ml I 1 I I ! 1 I I I 1 I 1 I I I I 1 I I I I I I I III III III HUM III Mill: - 
orf 131a YE I PLS DGNRSVRANEYESAQQSYFYRKIGKFEACGLDWRTRDGKPLIETFKQEGFDCLK 

70 80 90 100 110 120 



orf 131. pep K 
I 

orfl31a KQGLRRNGLSERVRWX 
130 

The complete length ORF131a nucleotide sequence <SEQ ID 859> is: 

1 ATGGAAATTC GGGCAATAAA ATATACGGCA ATGGCTGCGT TGCTTGCATT 

51 TACGGTTGCA GGCTGCCGGT TGGCAGGTTG GTATGAGTGT TCGTCCCTGT 

101 CCGGCTGGTG TAAGCCGAGA AAACCTGCCG CCATCGATTT TTGGGATATT 

151 GGCGGCGAGA GTCCTCCGTC TTTAGAGGAC TACGAGATAC CGCTTTCAGA 

201 CGGCAATCGT TCCGTCAGGG CAAACGAATA TGAATCCGCA CAACAATCTT 

251 ACTTTTACAG GAAAATAGGG AAGTTTGAAG CCTGCGGGTT GGATTGGCGT 

301 ACGCGTGACG GCAAACCTTT GATTGAGACG TTCAAACAGG AAGGTTTTGA 

351 TTGTTTGAAA AAGCAGGGGT TGCGGCGCAA CGGTCTGTCC GAGCGCGTCC 

401 GATGGTAA 

This encodes a protein having amino acid sequence <SEQ ID 860>: 

1 MEIRAIKYTA MAALLAFTVA G CRLAGWYEC SSLSGWCKPR KPAAIDFWDI 
51 GGESPPSLED YEIPLSDGNR SVRANEYESA QQSYFYRKIG KFEACGLDWR 
101 TRDGKPLIET FKQEGFDCLK KQGLRRNGLS ERVRW* 

ORF131a and ORF131-1 show 97.0% identity in 135 aa overlap: 

orf 131a. pep MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLSGWCKPRKPAAIDFWDIGGESPPSLED 
M I I II II III I M I II I II I II II II I I II I I : II II I I I I I II II I I II II II I I I I 
orf 131-1 MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLTGWCKPRKPAAIDFWDIGGESPPSLGD 

orf 131a . pep YEIPLSDGNRSVRANEYESAQQSYFYRKIGKFEACGLDWRTRDGKPLIETFKQEGFDCLK 
I II II I M II II M I II M I II I II II II M M M M I I II I I I I II I I I M I Mill: 
orf 131-1 YEIPLSDGNRSVRANEYESAQQSYFYRKIGKFEACGLDWRTRDGKPLIETFKQGGFDCLE 

orf 131a. pep KQGLRRNGLSERVRWX 
I I I 1 1 I M I II 1 1 I 1 1 
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orf 131-1 KQGLRRNGLSERVRWX 

Homology with a predicted ORF from N. gonorrhoeae 

ORF131 shows 89.3% identity over 121 aa overlap with a predicted ORF (ORF131ng) from 
N. gonorrhoeae: 

orf 131 . pep MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLTGWCKPRKPAAIDFWDIGGESPPSLGD 60 

Ml I: Ml I I MIMMMMMMI Ml I I : I t I I I I 1 I I I I I I I I I I I I I I II I 
orfl31ng MEIRVIKYTATAALFAFTVAGCRLAGWYECLSLSGWCKPRKPAAIDFWDIGGESPLSLED 60 

orf 131 . pep YEIPLSDGNSSVRANEYESAQQSYFYRKIGKFEXCGLDWRTRDGKPLIETFKQGGFDCLE 120 

I I I I I I I I I I I I I I I I II I I : I I I II I I I I I I M M I I I I I I I M : I III I I I I II 
orfl31ng YEIPLSDGNRSVRANEYESAQKSYFYRKIGKFEACGLDWRTRDGKPLVERFKQEGFDCLE 120 

orf 131. pep K 121 
I 

orfUlng KQGLRRNGLSERVRW 134 

A complete length ORF131ng nucleotide sequence <SEQ ID 861> was predicted to encode a 
protein having amino acid sequence <SEQ ID 862>: 

1 MEIRVIKYTA TAALFAFTVA GC RLAGWYEC LSLSGWCKPR KPAAIDFWDI , 
51 GGESPLSLED YEIPLSDGNR SVRANEYESA QKSYFYRKIG KFEACGLDWR 
101 TRDGKPLVER FKQEGFDCLE KQGLRRNGLS ERVRW* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 863>: 

1 ATGGAAATTC GGGTAATAAA ATATACGGCA ACGGCTGQGT TGTTTGCATT ; 

51 TACGGTTGCA GGCTGCCGGC TGGCGGGGTG GTATGAGTGT TCGTCCTTGT j 

101 CCGGCTGGTG TAAGCCGAGA AAACCTGCCG CCATCGATTT TTGGGATATT 

151 GGCGGCGAGA GtccgctGTC TTTAGAGGAC TACGAGATAC CGCTTTCAGA 

201 CGGCAATCGT TCCGTCAGGG CAAACGAATA TGAATCCGCG CAAAAATCTT 

251 ACTTTTATAG GAAAATAGGG AAGTTTGAAG CCTGCGGGTT GGATTGGCGT 

301 ACGCGTGACG GCAAACCTTT GGTTGAGAGG TTCAAACAGG AAGGTTTCGA 

351 CTGTTTGGAA AAGCAGGGGT TGCGGCGCAA CGGCCTGTCC GAGCGCGTCC 

401 GATGGTAA 

This corresponds to the amino acid sequence <SEQ ID 864; ORF131ng-l>: 

1 MEIRVIKYTA TAALFAFTVA G CRLAGWYEC SSLSGWCKPR KPAAIDFWDI 
51 GGESPLSLED YEIPLSDGNR SVRANEYESA QKSYFYRKIG KFEACGLDWR 
101 TRDGKPLVER FKQEGFDCLE KQGLRRNGLS ERVRW* 

ORF131ng-l and ORF131-1 show 92.6% identity in 135 aa overlap: 

orf 131ng~l .pep MEIRVIKYTATAALFAFTVAGCRLAGWYECSSLSGWCKPRKPAAIDFWDIGGESPLSLED 
1111:11111 I I I : I I I I I II I I I I I i I I I I! : I 1 I I I I I I I I I M I I I H M I II I 
orf 131-1 ME I RAIKYTAMAALLAFT VAGCRLAGWYEC S S LTGWCKPRKPAAI DFWD IGGE S PPS LGD 

orf 131ng-l . pep YEIPLSDGNRSVRANEYESAQKSYFYRKIGKFEACGLDWRTRDGKPLVERFKQEGFDCLE 
Ml M M MMMM! M MMMM MMM MMMMMM M Ml Ml MINI 
orfl31-l YEIPLSDGNRSVRANEYESAQQSYFYRKIGKFEACGLDWRTRDGKPLIETFKQGGFDCLE 

orfl31ng-l.pep KQGLRRNGLSERVRWX 
MIMMMMMMI 
0rfl31-l KQGLRRNGLSERVRWX 

Based on the presence of a predicted prokaryotic membrane lipoprotein lipid attachment site, it is 
predicted that the proteins from N. meningitidis and N.gonorrhoeae, and their epitopes, could be 



useful antigens for vaccines or diagnostics, or for raising antibodies. 
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Example 102 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 865> 

1 ATGAAACACA TCCATATTAT CGGTATCGGC GGCACGTTTA TGGGCGGGCT 

51 TGCCGCCATT GCCAAAGAAG CGGGGTTTGA AGTCAGCGGT TGCGACGCGA 

101 AGATGTATCC GCCGATGAGC ACCCAGCTCG AAGCCTTGGG TATAGACGTG 

151 TATGAAGGCT TCGATGCCGC TCAGTTGGAC GAATTTAAAG CCGACGTTTA 

201 CGTTATCGGC AATGTCGCCA AGCGCGGGAT GGATGTGGTT GAAGCGATTT 

251 TGAACCTCGG CCTGCCtTAT ATtTcCGGCC CGCAATGGCT GTCGGAAAAC 

301 GTGCTGCACC ATCATTGGGT ACTCGGTGTG GCGGGGACgC ACGGCAAAAC 

351 GACCACCGCC TCCATGCTCG CATGGGTCTT GGAATATgCC GGCCTCGCGC 

401 CGGGCTTCCT TATtGGCGGC GTACC . GGAA AATttCGGCG TTTCCGCCCG 

451 CCTGCCGCAA ACGCCGCGCC AAGACCCGAA CAGCCAATCG CCGTTTTTcG 

501 TCATCGAAGC CGACGAATAC GACACCGCCT TTtTCGACAA ACGTTCTAAA 

551 TtCGTGCATT ACCGTCCGCG TACCGCCGTG TTGAACAATC TGGAATTCGA 

601 CCACGCCGAC ATCTTTGCCG ACTTGGGCGC GATACAGACc CAGTTCCACT 

651 ACCTCGTGCG TACCGTGCCG TCTGAAGGCT TAATCGTCTG CAACGGACGG 

701 CAGCAAAGCC TGCAAGATAC TTTGGACAAA GGCTGCTGGA CGCCGGTGGA 

751 AAAATTCGGC ACGGAACACG GCTGGCA. . 

This corresponds to the amino acid sequence <SEQ ID 866; ORF132>: 

1 MKHIHIIGIG GTFMGGLAAI AKEAGFEVSG CDAKMYPPMS TQLEALGIDV 

51 YEGFDAAQLD EFKADVYVIG NVAKRGMDW EAILNLGLPY ISGPQWLSEN 

101 VLHHHWVLGV AGTHGKTTTA SMLAWVLEYA GLAPGFLIGG VXGKFRRFRP 

151 PAANAAPRPE QPIAVFRHRS RRIRHRLFRQ TFXIRALPSA YRRVEQSGIR 

201 PRRHLCRLGR DTDPVPLPRA YRAVXRLNRL QRTAAKPARY FGQRLLDAGG 

251 KIRHGTRLA. . 

Further work revealed the complete nucleotide sequence <SEQ ID 867>: 

1 ATGAAACACA TCCATATTAT CGGTATCGGC GGCACGTTTA TGGGCGGGCT 

51 TGCCGCCATT GCCAAAGAAG CGGGGTTTGA AGTCAGCGGT TGCGACGCGA 

101 AGATGTATCC GCCGATGAGC ACCCAGCTCG AAGCCTTGGG TATAGACGTG 

151 TATGAAGGCT TCGATGCCGC TCAGTTGGAC GAATTTAAAG CCGACGTTTA 

201 CGTTATCGGC AATGTCGCCA AGCGCGGGAT GGATGTGGTT GAAGCGATTT 

251 TGAACCTCGG CCTGCCTTAT ATTTCCGGCC CGCAATGGCT GTCGGAAAAC 

301 GTGCTGCACC ATCATTGGGT ACTCGGTGTG GCGGGGACGC ACGGCAAAAC 

351 GACCACCGCC TCCATGCTCG CATGGGTCTT GGAATATGCC GGCCTCGCGC 

401 CGGGCTTCCT TATTGGCGGC GTACCGGAAA ATTTCGGCGT TTCCGCCCGC 

451 CTGCCGCAAA CGCCGCGCCA AGACCCGAAC AGCCAATCGC CGTTTTTCGT 

501 CATCGAAGCC GACGAATACG ACACCGCCTT TTTCGACAAA CGTTCTAAAT 

551 TCGTGCATTA CCGTCCGCGT ACCGCCGTGT TGAACAATCT GGAATTCGAC 

601 CACGCCGACA TCTTTGCCGA CTTGGGCGCG ATACAGACCC AGTTCCACTA 

651 CCTCGTGCGT ACCGTGCCGT CTGAAGGCTT AATCGTCTGC AACGGACGGC 

701 AGCAAAGCCT GCAAGATACT TTGGACAAAG GCTGCTGGAC GCCGGTGGAA 

751 AAATTCGGCA CGGAACACGG CTGGCAGGCC GGCGAAGCCA ATGCCGACGG 

801 CTCGTTCGAC GTGTTGCTCG ACGGCAAAAC CGCCGGACGC GTCAAATGGG 

851 ATTTGATGGG CAGGCACAAC CGCATGAACG CGCTCGCCGT CATTGCCGCC 

901 GCGCGTCATG TCGGTGTCGA TATTCAGACC GCCTGCGAAG CCTTGGGCGC 

951 GTTTAAAAAC GTCAAACGCC GGATGGAAAT CAAAGGCACG GCAAACGGCA 

1001 TCACCGTTTA CGACGACTTC GCCCACCACC CGACCGCCAT CGAAACCACG 

1051 ATTCAAGGTT TGCGCCAACG CGTCGGCGGC GCGCGCATCC TCGCCGTCCT 

1101 CGAACCGCGT TCCAACACGA TGAAGCTGGG CACGATGAAG TCCGCCCTGC 

1151 CTGTAAGCCT CAAAGAAGCC GACCAAGTGT TCTGCTACGC CGGCGGCGTG 

1201 GACTGGGACG TCGCCGAAGC CCTCGCGCCT TTGGGCGGCA GGCTGAACGT 

1251 CGGCAAAGAC TTCGATGCCT TCGTTGCCGA AATCGTGAAA AACGCCGAAG 

1301 TAGGCGACCA TATTTTGGTG ATGAGCAACG GCGGTTTCGG CGGAATACAC 

1351 GGAAAGCTGC TGGAAGCTTT GAGATAG 

This corresponds to the amino acid sequence <SEQ ID 868; ORF132-l>: 

1 MKHIHIIGIG GTFMGGLAAI AK EAGFEVSG CDAKMYPPMS TQLEALGIDV 

51 YEGFDAAQLD EFKADVYVIG NVAKRGMDW EAILNLGLPY ISGPQWLSEN 

101 VLHHHWVLGV AGTHGKTTTA SMLAWVLEYA GLAPGFLIGG VPENFGVSAR 

151 LPQTPRQDPN SQSPFFVIEA DEYDTAFFDK RSKFVHYRPR TAVLNNLEFD 

201 HADIFADLGA IQTQFHYLVR TVPSEGLIVC NGRQQSLQDT LDKGCWTPVE 

251 KFGTEHGWQA GEANADGSFD VLLDGKTAGR VKWDLMGRHN RMNALAVIAA 

301 ARHVGVDIQT ACEALGAFKN VKRRMEIKGT ANGITVYDDF AHHPTAIETT 



WO 99/24578 PCT/IB98/01665 

-469- 

351 IQGLRQRVGG ARILAVLEPR SNTMKLGTMK SALPVSLKEA DQVFCYAGGV 
401 DWDVAEALAP LGGRLNVGKD FDAFVAEIVK NAEVGDHILV MSNGGFGGIH 
451 GKLLEALR* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with the hypothetical o457 protein of E.coli (access ion number U14003) 
ORF132 and o457 show 58% aa identity in 140 aa overlap: 

0rfl32: 4 IHIIGIGGTFMGGLAAIAKEAGFEVSGCDAKMYPPMSTQLEALGIDVYEGFDAAQLDEFK 63 

IHI+GI GTFMGGLA +A++ G EV+G DA +YPPMST LE GI++ +G+DA+QL+ + 
0457: 3 IHILGICGTFMGGLAMLARQLGHEVTGSDANVYPPMSTLLEKQGIELIQGYDASQLEP-Q 61 

Orf 132 : 64 ADVYVI GNVAKRGMDWEAI LNLGLPY I SGPQWLSENVLHHHWVLGVAGTHGKTTTASML 123 

D+ +IGN RG VEA+L +PY+SGPQWL + VL WVL VAGTHGKTTTA M 
0457: 62 PDLVI I GNAMTRGNPCVEAVLEKNI P YMSGPQWLHDFVLRDRWVLAVAGTHGKTTTAGMA 121 

Orf 132: 124 AWVLEYAGLAPGFLIGGVXG 143 

W+LE G PGF+IGGV G 
o457: 122 TWILEQCGYKPGFVIGGVPG 141 

Homology with a predicted ORF from Kmenin&tidis (strain A) 

ORF132 shows 74.6% identity over a 189aa overlap with an ORF (ORF132a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 j 

orf 132 . pep bOCHIHIIGIGGTFMGGLAAIAKEAGFEVSGCDAKMYPPMSTQLElALGIDVYEGFDAAQLD j 

I | I { I I I I I I I I I I I I : I I I I I I I I I I 1 i I I I I I ( I I I I I I I I I I I I S I I I I I : I < I i 
orf 132a MKHIHIIGIGGTFMGGIAAIAKEAGFEXSGCDAKMYPPMSTQLEALGIGVYEGFDTAQLD 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 132 . pep EFKADVYVI GNVAKRGMDWEAI LNLGLPY I SGPQWLSENVLHHHWVLGVAGTHGKTTT A 
I I 1 I 1 I I I 1 I 1 I 1 I I I I 1 I I f 1 I I I 11111111111:11 Hill I I I I MINIM 
orf!32a E FKADVYVIGNVAKRGMDVVEAILNRGLPYISGPQWLAENXLHHHWXLGVAXTHGBCTTTA 

70 80 90 100 110 120 

130 140 150 160 
orf 132 . pep SMLAWVLEYAGLAPGFLIGGVXGKFR RFRPPAANAAPRPEQPI AVFR 

I I I M I I M M It M I MM : I Ml: I : : I : > 1 

orf 132a SMLAWVLEYAGLAPGFXIGGVPENFSVSARL-PQTPRQDPNSQSPFFVIEADEYDTAFFD 

130 140 150 160 170 

170 180 190 200 210 220 

orf 132 . pep HRSRRIRHRLFRQTFXIRALPSAYRRVEQSGIRPRRHLCRLGRDTDPVPLPRAYRAVXRL 

:IM :::| 

orf 132a KRSKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHHLVRTVPSEGLIVCNGRQQSLQD 
180 190 200 210 220 230 

The complete length ORF132a nucleotide sequence <SEQ ID 869> is: 

1 ATGAAACACA TCCACATTAT CGGTATCGGC GGCACGTTTA TGGGTGGGAT 

51 TGCCGCCATT GCCAAAGAAG CAGGGTTTGA ANTCAGCGGT TGCGATGCGA 

101 AGATGTATCC GCCGATGAGC ACCCAGCTCG AAGCCTTGGG CATAGGCGTG 

151 TATGAAGGCT TCGACACCGC GCAGTTGGAC GAATTTAAAG CCGACGTTTA 

201 CGTTATCGGC AATGTCGCCA AGCGCGGGAT GGATGTGGTT GAAGCGATTT 

251 TGAACCGTGG GCTGCCTTAT ATTTCCGGCC CGCAATGGCT GGCTGAAAAC 

301 NTGCTGCACC ATCATTGGNN ACTCGGCGTG GCGGNGACGC ACGGCAAAAC 

351 GACCACCGCG TCTATGCTCG CGTGGGTTTT GGAATATGCC GGACTCGCAC 

401 CGGGCTTCNT TATCGGCGGC GTACCGGAAA ACTTCAGCGT TTCCGCCCGC 

451 CTGCCGCAAA CGCCGCGCCA AGACCCGAAC AGCCAATCGC CGTTTTTCGT 

501 CATTGAAGCC GACGAATACG ACACCGCGTT TTTCGACAAA CGCTCCAAAT 

551 TCGTGCATTA CCGTCCGCGT ACCGCCGTGT TGAACAATCT GGAATTCGAC 

601 CACGCCGACA TCTTCGCCGA TTTGGGCGCG ATACAGACCC AGTTCCACCA 

651 CCTCGTGCGT ACCGTGCCGT CTGAAGGCCT CATCGTCTGC AACGGACGGC 

701 AGCAAAGCCT GCAAGACACT TTGGACAAAG GCTGCTGGAC GCCGGTGGAA 

751 AAATTCGGCA CGGAACACGG CTGGCAGGCC GGCGAAGCCA ATGCCGATGG 
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801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 



CTCGTTCGAC 
GTTTGATGGG 
GCGCGTCATG 
GTTTAAAAAC 
TCACCGTTTA 
ATTCAAGGTT 
CGAACCGCGT 
CCGCAAGCCT 
GACTGGGACG 
CGGCAAAGAC 
CAGGCGACCA 
ACCAAACTGC 



GTGTTGCTTG 
CGGACACAAC 
CCGGAGTNGA 
GTCAAACGCC 
CGACGACTTC 
TGCGCCAGCG 
TCCAATACGA 
CAAAGAAGCC 
TTGCCGAAGC 
TTCGATGCCT 
TATTTTGGTG 
TGGACGCTTT 



ACGGCAAAAA 
CGCATGAACG 
CATTCAGACG 
GCATGGAAAT 
GCCCACCATC 
CGTCGGCGGC 
TGAAGCTGGG 
GACCAAGTGT 
CCTCGCGCCT 
TCGTTGCCGA 
ATGAGCAACG 
GAGATAG 



AGCCGGACAC 
CGCTCGCNGT 
GCCTGCGAAG 
CAAAGGCACG 
CGACCGCTAT 
GCGCGCATCC 
TACGATGAAA 
TCTGNTACGC 
TTGGGCGGCA 
AATCGTGAAA 
GCGGTTTCGG 



GTCGCTTGGA 
CATCGCCGCC 
CCTTGAGCAC 
GCAAACGGTA 
CGAAACCACG 
TCGCCGTCCT 
GCCGCCCTGC 
CGGCGGCGCG 
GGCTGCACGT 
AACGCCGAAG 
CGGAATACAC 



This encodes a protein having amino acid sequence <SEQ ID 870>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 



MKHIHIIGIG GTFMGGIAAI 



YEGFDTAQLD 
XLHHHWXLGV 
LPQTPRQDPN 
HADIFADLGA 
KFGTEHGWQA 
ARHAGVDIQT 
IQGLRQRVGG 
DWDVAEALAP 
TKLLDALR* 



EFKADVYVIG 
AXTHGKTTTA 
SQSPFFVIEA 
IQTQFHHLVR 
GEANADGSFD 
ACEALSTFKN 
ARILAVLEPR 
LGGRLHVGKD 



AKEAGFEXSG 
NVAKRGMDW 
SMLAWVLEYA 
DEYDTAFFDK 
TVPSEGLIVC 
VLLDGKKAGH 
VKRRMEIKGT 
SNTMKLGTMK 
FDAFVAEIVK 



CDAKMYPPMS 
EAILNRGLPY 
GLAPGFXIGG 
RSKFVHYRPR 
NGRQQSLQDT 
VAWSLMGGHN 
ANGITVYDDF 
AALPASLKEA 
NAEAGDHILV 



TQLEALGIGV 
ISGPQWLAEN 
VPENFSVSAR 
TAVLNNLEFD 
LDKGCWTPVE 
RMNALAVIAA 
AHHPTAIETT 
DQVFXYAGGA 
MSNGGFGGIH 



ORF132a and ORF132-1 show 93.9% identity in 458 aa overlap: 



orf 132a. pep MKHIHIIGIGGTFMGGIAAIAKEAGFEXSGCDAKMYPPMSTQLEALGIGVYEGFDTAQLD 
I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I 
orf 132-1 MKHIHIIGIGGTFMGGLAAIAKEAGFEVSGCDAKMYPPMSTQLEALGIDVYEGFDAAQLD 

orf 132a . pep EFKADVYVIGNVAKRGMDVVEAILNRGLPYISGPQWLAENXLHHHWXLGVAXTHGKTTTA 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I: II I II I I I I It I I I I I I II 
orfl32-l E FKADVYV I GNVAKRGMDWEAI LNLGLPYI SGPQWLSENVLHHHWVLGVAGTHGKTTTA 

or f 132a . pep SMLAWVLEYAGLAPGFXIGGVPENFSVSARLPQTPRQDPNSQSPFFVIEADEYDTAFFDK 
IMMI II MINIM I II III II: II I II II Ml I II MM Ml Mill MM II II I 
orf 132-1 SMLAWVLEYAGLAPGFLIGGVPENFGVSARLPQTPRQDPNSQSPFFVIEADEYDTAFFDK 

orf 132a . pep RSKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHHLVRTVPSEGLIVCNGRQQSLQDT 
M I I M M M I I II II II M I II I I I I M II II II I : I M I II I M II I II II I I II II I 
orf 132-1 RSKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHYLVRTVPSEGLIVCNGRQQSLQDT 



orf 132a . pep LDKGCWTPVEKFGTEHGWQAGEANADGSFDVLLDGKKAGHVAWSLMGGHNRMNALAVIAA 
M II I II II I I I I I II II I II I II M II II M II II MM 1:111 II II I I I I II I I 
orf 132-1 LDKGCWT PVEKFGTEHGWQAGEANADGS FDVLLDGKTAGRVKWDLMGRHN RMNALAVIAA 

orf 132a . pep ARHAGVDIQTACEALSTFKNVKRRMEIKGTANGITVYDDFAHHPTAIETT IQGLRQRVGG 
M UNI M 1M M I:: M I MM M I I II I ill II M I I I Ml I I MM I I M M MM 
orf 132-1 ARHVGVDIQTACEALGAFKNVKRRME IKGTANGITVYDDFAHHPTAIETT IQGLRQRVGG 

orf 132a . pep ARI LAVLE PRSNTMKLGTMKAAL PAS LKEADQVFXYAGGADWDVAEALAPLGGRLHVGKD 
M II II I I II I I II I II M I : I II M II II II I I M I I : I I I 1 I I 1 I I t I I I k I : I I I I 
orf 132-1 ARILAVLEPRSNTMKLGTMKSALPVSLKEADQVFCYAGGVDWDVAEALAPLGGRLNVGKD 

orf 132a. pep FDAFVAEIVKNAEAGDHILVMSNGGFGGIHTKLLDALRX 
I M II M I II II I : I I I II M I II I M I M 111:1111 
orfl32-l FDAFVAEIVKNAEVGDHILVMSNGGFGGIHGKLLEALRX 



Homology with a predicted ORF from N. gonorrhoeae 

ORF132 shows 89.6% identity over 259 aa overlap with a predicted ORF (ORF132ng) from N. 
gonorrhoeae: 

orf 132 .pep MKHIHIIGIGGTFMGGLAAIAKEAGFEVSGCDAKMYPPMSTQLEALGIDVYEGFDAAQLD 60 

I 11 II I III M M M I : II M M II I : II II II M II II II I II II II I: II I II II I : 
orfl32ng MKHIHIIGIGGTFMGGIAAIAKEAGFKVSGCDAKMYPPMSTQLEALGIGVHEGFDAAQLE 60 
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orf 132 pep EFKADVYVIGNVAKRGMDWEAILNLGLPYISGPQWLSENVLHHHWVLGVAGTHGKTTTA 120 

| |:||:M 11111:1 II II Ml Mi M I I I I I I I I I : I I I I I I I I I I I I I M ! I I I I I I 
orfl32ng E FQAD I YVIGNVARRGMDWEAI LNRGLPY ISGPQWLAENVLHHHWVLGVAGTHGKTTT A 120 

orfl32 pep SMLAWVI^YAGLAPGFLIGGVXGKFRRFRPPAANAAPRPEQPIAVFRHRSRRIRHRLFRQ 180 

I I ! I ! I I I [ I I 1 1 I I 1 1 I 1 1 1 IMMIMIMIM Mil IMIMMMIIIIIIM 
orf 132ng SMLAWVLEYAGLAPGFLIGGVPGKFRRFRPPTANAASRPEQQIAVFRHRSRRIRHRLFRQ 180 

orf 132 . pep TFXIRALPSAYRRVEQSGIRPRRHLCRLGRDTDPVPLPRAYRAVXRLNRLQRTAAKPARY 240 

I : MM I II I I II II I I! I II I I I I II I II I I I I I : I : : I : M I I II I M I II 
orfl32ng TLQIRALSPAYRRVEQSGIRPRRHLRRLGRDTDPVPPPRAHRTIRRPHRLQRTAAKPARY 240 

orf 132 . pep FGQRLLDAGGKIRHGTRLA 259 

K I I 1 I I I I I t I I 1 I I II I 
orfl32ng FGQRLLDAGGKIRHRTRLADW 261 

An ORF1 32ng nucleotide sequence <SEQ ID 87 1> was predicted to encode a protein having amino 
acid sequence <SEQ ID 872>: 

1 MKHIHIIGIG GTFMGGIAAI A KEAGFKVSG CDAKMYPPMS TQLEALGIGV 

51 HEGFDAAQLE EFQADIYVIG NVARRGMDW EAI LNRGLPY ISGPQWLAEN 

101 VLHHHWVLGV AGTHGKTTTA SMLAWVLEYA GLAPGFLIGG VPGKFRRFRP 

151 PTANAASRPE QQIAVFRHRS RRIRHRLFRQ TLQIRALSPA YRRVEQSGIR 

201 PRRHLRRLGR DTDPVPPPRA HRTIRRPHRL QRTAAKPARY FGQRLLDAGG 

251 KIRHRTRLAD W* 1 

Further work revealed the following gonococcal DNA sequence <SEQ ID 873>: 

1 ATGAAACACA TCCACATTAT CGGTATCGGC GGCACGTTTA TGGGCGGGAT | 

51 TGCCGCCATT GCCAAAGAAG CCGGGTTCAA AGTCAGCGGT TGCGACGCGA 

101 AGATGTATCC GCCGATGAGC ACCCAGCTCG AAGCCTTGGG CATAGGCGTA j 

151 CACGAAGGCT TCGATGCCGC GCAGTTGGAA GAATTTCAAG CCGATATTTA I 

201 CGTCATCGGC AATGTCGCCA GGCGCGGGAT GGATGTGGTC GAGGCGATTT 

251 TGAACCGTGG GCTGCCTTAT ATTTCCGGCC CGCAATGGCT GGCTGAAAac 

301 GTGCtgcacc atcaTTGGgt ACTCGGCGTG GcagggaCGC ACGGcaaAac 

351 gaccaCcGcg tCCATGCTCG CCTGGGTCTT GGAATATGCC GGACTCGCGC 

401 CGGGCTTCCT CATCGGCGGt gtaccggaAA ATTTCGGCGT TTCCGCCCGC 

451 CTACCGCAAA CGCCGCGTCA AGACCCGAAC AGCAAATCGC CGTTTTTCGT 

501 CATCGAAGCC GACGAATACG ACACCGCCTT TTTCGACAAA CGCTCCAAAT 

551 TCGTGCATTA TCGCCCGCGT ACCGCCGTGT TGAACAATCT GGAATTCGAC 

601 CACGCCGACA TCTTCGCCGA CTTGGGCGCG ATACAGACCC AGTTCCACCA 

651 CCTCGTGCGC ACCGTACCAT CCGAAGGCCT CATCGTCTGC AACGGACAGC 

701 AGCAAAGCCT GCAAGATACT TTGGACAAAG GCTGCTGGAC GCCGGTGGAA 

751 AAATTCGGCA CCGGACACGG CTGGCAGATT GGTGAAGTCA ATGCCGACGG 

801 CTCGTTCGAC GTATTGCTTG ACGGCAAAAA AGCCGGACAC GTCGCATGGG 

851 ATTTGATGGG CGGACACAAC CGCATGAACG CGCTCGCCGT CATCGCTGCC 

901 GCACGCCATG CCGGAGTCGA TGTTCAGACG GCCTGCGAAG CCTTGGGTGC 

951 GTTTAAAAAC GTCAAACGCC GCATGGAAAT CAAAGGCACG GCAAACGGCA 

1001 TCACCGTTTA CGACGATTTC GCCCACCACC CGACCGCCAT CGAAACCACG 

1051 ATTCAAGGTT TGCGCCAACG TGTCGGCGGC GCGCGCATCC TCGCCGTCCT 

1101 CGAGCCGCGT TCCAACACCA TGAAACTCGG CACGATGAAG TCCGCCCTGC 

1151 CCGCAAGCCT CAAAGAAGCC GACCAAGTGT TCTGCTACGC CGGCGGCGCG 

1201 GACTGGGACG TTGCCGAAGC CCTCGCGCCT TTGGGCTGCA GGCTGCGCGT 

1251 CGGTAAAGAT TTCGATACCT TCGTTGCCGA AATTGTGAAA AACGCCCGAA 

1301 CCGGCGACCA TATTTTGGTG ATGAGCAACG GCGGTTTCGG CGGAATACAC 

1351 ACCAAACTGC TGGACGCTTT GAGATAG 

This corresponds to the amino acid sequence <SEQ ID 874; ORF132ng-l>: 

1 MKHIHIIGIG GTFMGGIAAI A KEAGFKVSG CDAKMYPPMS TQLEALGIGV 

51 HEGFDAAQLE EFQADIYVIG NVARRGMDW EAI LNRGLPY ISGPQWLAEN 

101 VLHHHWVLGV AGTHGKTTTA SMLAWVLEYA GLAPGFLIGG VPENFGVSAR 

151 LPQTPRQDPN SKSPFFVIEA DEYDTAFFDK RSKFVHYRPR TAVLNNLEFD 

201 HADIFADLGA IQTQFHHLVR TVPSEGLIVC NGQQQSLQDT LDKGCWTPVE 

251 KFGTGHGWQI GEVNADGSFD VLLDGKKAGH VAWDLMGGHN RMNALAVIAA 

301 ARHAGVDVQT ACEALGAFKN VKRRMEIKGT ANGITVYDDF AHHPTAIETT 

351 IQGLRQRVGG ARILAVLEPR SNTMKLGTMK SALPASLKEA DQVFCYAGGA 

401 DWDVAEALAP LGCRLRVGKD FDTFVAEIVK NARTGDHILV MSNGGFGGIH 

451 TKLLDALR* 



WO 99/24578 



-472- 



PCT/IB98/01665 



ORF132ng-l and ORF132-1 show 93.2% identity in 458 aa overlap: 

orf 132ng-l .pep MKHIHIIGIGGTFMGGIAAIAKE^FKVSGCDAKMYPPMSTQLEALGIGVHEGFDAAQLE 

Ml' II I M III M ||:| MINI 11:111 I! I ! I !l II Ml!; MM I: Ill: 

orf 132-1 MKHIHIIGIGGTFMGGLAAIAKEAGFEVSGCDAKMYPPMSTQLEALGIDVYEGFDAAQLD 

5 

orf 132ng-l .pep EFQADIYVIGNVAIttGMDVVEAILNRGLPYISGPQWI^ 

I I ? I i = I I I I t I I : ! I I I ! I 1 I I I ! II MM Ml II: I I till I II I 111 III III III 
orf 132-1 EFKADVYVIGNVAKRGMDWEAILNLGLPYISGPQWLSENVTiHHHOT 

10 orf 132ng-l .pep SMLAWVLEYAGLAPGFLIGGVPENFGVSARLPQTPRQDPNSKSPFFVIEADEYDTAFFDK 

II I I II I 1 1 1 II I II II I 1 1 1 I II M 1 1 M M II M M I M : 1 1 II I I M M 1 1 1 1 II 1 1 
orf 132-1 SMLAWVLEYAGLAPGFLIGGVPENFGVSARLPQTPRQDPNSQSPFFVIEADEYDTAFFDK 

orf 132ng-l .pep RSKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHHLVRTVPSEGLivCNGQQQSLQDT 
15 I I M I M I M I I I II I I i II I II I M I II M I I II I : I I I I II I II I I I M I : I M I M I 

orf 132-1 RSKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHYLVRTVPSEGLIVCNGRQQSLQDT 

orf 132ng-l .pep LDKGCWT PVEKFGTGHGWQIGEVNADGS FDVLLDGKKAGHVAWDLMGGHNRMNALAV IAA 
I I I M II I I I II I I MM II : M I II II II M I I t I : I I I I II I I II If I M il I 
20 orf 132-1 LDKGCWT PVEKFGTEHGWQAGEANADGSFDVLLDGKTAGRVKWDLMGRHNRMNALAVIAA 



25 



orf 132ng-l . pep ARHAGVD VQTACEALGAFKNVKRRME IKGTANG I TVYDDFAHHPT AIETTI QGLRQRVGG 
I M : I I I : I I I I II II I II II I I II II M I I I I II I II M I I II I II I I M II I II I II I 
orf 132-1 ARHVGVDIQTACEALGAFKNVKRRMEIKGTANGITVYDDFAHHPTAIETTIQGLRQRVGG 

or f 132ng-l . pep ARILAVLEPRSNTMKLGTMKSALPASLKEADQVFCYAGGADWDVAEALAPLGCRLRVGKD 
t I 1 I 1 I I I 1 t S 1 I I I I 1 I 1 I I I I I r I I 1 1 I I t I I I t i I 1 : t I I I I I I I I I I K II MM 
orf 132-1 ARILAVLEPRSNTMKLGTMKSALPVSLKEADQVFCYAGGVDWDVAEALAPLGGRLNVGKD 



30 orfl32ng-l.pep FDTFVAE I VKNARTG DH I LVMSNGGFGG IHTKLLOALRX 

11:1111 I I I I I I 1 f I I I I I I I I I I I I I 111:1111 
orf 132-1 FDAFVAEIVKNAEVGDHILVMSNGGFGGIHGKLLEALRX 

In addition, ORF132ng-l is homologous to a hypothetical E.coli protein: 

pir||S56459 hypothetical protein o457 - Escherichia coli >gi|537075 (U14003) 
35 ORF_o457 (Escherichia coli] >gi 11790680 (AE000494) hypothetical 48.5 JcD protein 

in fbp-pmba intergenic region [Escherichia coli] Length = 457 
Score - 474 bits (1207), Expect - e-133 

Identities = 249/439 (56%), Positives = 294/439 (66%) , Gaps = 13/439 (2%) 

40 Query: 22 KEAGFKVSGCDAKMYPPMSTQLEALGIGVHEGFDAAQLEEFQADIYVIGNVARRGMDWE 81 

++ G +V+G DA +YPPMST LE GI + +G+DA+QLE Q D+ +IGN RG VE 
Sbjct: 21 RQLGHEVTGSDANVYPPMSTLLEKQGIELIQGYDASQLEP-QPDLVIIGNAMTRGNPCVE 79 

Query: 82 AILNRGLPYI SGPQWLAENVXHHHWVLGVAGTHGKTTTASMLAWVXEYAGLAPGFLIGGV 141 
45 A+L + +PY+SGPQWL + VL WVL VAGTHGKTTTA M W+LE G PGF+IGGV 

Sbjct: 80 AVLEKNIPYMSGPQWLHDFVLRDRWVLAVAGTHGKTTTAGMATWILEQCGYKPGFVIGGV 139 

Query: 142 PENFGVSARLPQTPRQDPNSKSPFFVIEADEYDTAFFDKRSKFVHYRPRTAVLNNLEFDH 201 
P NF VSA L +S FFVTEADEYD AFFDKRSKFVHY PRT +LNNLEFDH 

50 Sbjct: 140 PGNFEVSAHL GESDFFVIEADEYDCAFFDKRSKFVHYCPRTLILNNLEFDH 190 



55 



Query: 202 ADIFADLGAIQTQFHHLVRTVPSEGLIVCNGQQQSLQDTLDKGCWTPVEKFGTGHGWQIG 261 

ADIF DL AIQ QFHHLVR VP +G 1+ +L+ T+ GCW+ EG WQ 

Sbjct: 191 ADIFDDLKAIQKQFHHLVRIVPGQGRIIWPENDINLKQTMAMGCWSEQELVGEQGHWQAK 250 

Query: 262 EVNADGS-FDVLLDGKKAGHVAWDLMGGHNRMNALAVIAAARHAGVDVQTACEALGAFKN 320 

++ D S ++VLLDG+K G V W L+G HN N L IAAARH GV A ALG+F N 
Sbjct: 251 KLTTDASEWEVLLDGEKVGEVKWSLVGEHNMHNGLMAIAAARHVGVAPADAANALGSFIN 310 



60 Query: 321 VTCRRMEIKGTANGITVYDDFAHHPTAIETTIQGLRQRVGG-ARILAVLEPRSNTMKLGTM 379 

+RR+E++G ANG+TVYDDFAHHPTAI T+ LR +VGG ARI+AVLEPRSNTMK+G 

Sbjct: 311 ARRRLELRGEANGVTVYDDFAHHPTAI LATLAALRGKVGGTARI IAVLEPRSNTMKMGIC 370 

Query: 380 KSALPASLKEADQVF-CYAGGADWDVAEALAPLGCRLRVGKDFDTFVAEIVKNARTGDHI 438 
65 K L SL AD+VF W VAE D DT +VK A+ GDHI 

Sbjct: 371 KDDLAPSLGRADEVFLLQPAHIPWQVAEVAEACVQPAHWSGDVDTIADMVVKTAQPGDHI 430 

Query: 439 LVMSNGGFGG I HTKLLDAL 457 
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LVMSNGGFGGIH KLLD L 
Sbjct: 431 LVMSNGGFGGIHQKLLDGL 449 

Based on this analysis, it was predicted that these proteins from N.meningitidis and Kgonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF132-1 (26.4kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
20A shows the results of affinity purification of the His-fusion protein, and Figure 20B shows the 
results of expression of the GST-fusion in E.coli. Purified His-fusion protein was used to immunise 
mice, whose sera were used for FACS analysis (Figure 20C) and ELISA (positive result). These 
experiments confirm that ORF132 is a surface-exposed protein, and that it is a useful immunogen. 

Example 103 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 875> 

1 . .CCGGGCTATT ACGGCTCGGA TGACGAATTT AAGCGGGCAT TCGGAGAAAA 

51 CTCGCCGACA TmCAAGAAAC ATTGCAACCG GAGCTGCGGG ATTTATGAAC 

101 CCGTATTGAA AAAATACGGC AAAAAGCGCG CCAACAACCA TTCGGTCAGC j 

151 ATTAGTGCGG ACTTCGGCGA TTATTTCATG CCGTTCGCCA GCTATTCGCG 

201 CACACACCGT ATGCCCAACA TCCAAGAAAT GTATTTTTCC CAAATCGGCG j 

251 ACTCCGGCGT TCACACCGCC TTAAAACCAG AGCGCGCAAA CACTTGGCAA 1 

301 TTTGGCTTCr ATACCTATAA AAAAGGATTG TTAAAACAAG ATGATACATT 

351 AGGATTAAAA CTGGTCGGCT ACCGCAGCCG CATCGACAAC TACATCCACA 

401 ACGTTTACGG GAAATGGTGG GATTTGAACG GGGATATTCC GAGCTGGGTC 

451 AGCAGCACCG GGCTTGCCTA CACCATCCAA CATCGCrATT TCAwAGACAA 

501 AGTGCATCAA nnnnnnnnnn nnnnnnnnnn nnnnTACGAT TATGGGCGTT 

551 TTTTCACCAA CCTTTCTTAC GCCTATCAAA AAAGCACGCA ACCGACCAAC 

601 TTCAGCGATG CGAGCGAATC GCCCAACAAT GCGTCCAAAG AAGACCAACT 

651 CAAACAAGGT TATGGGTTGA GCAGGGTTTC CGCCCTGCCG CGAGATTACG 

701 GACGTTTGGA AGTCGGTACG CGCTGGTTGG GCAACAAACT GACTTTGGGC 

751 GGCGCGATGC GCTATTTCGG CAAGAGCATC CGCGCGACGG CTGAAGAACG 

801 CTATATCGAC GGCACCAACG GGGGAAATAC CAGCAATTTC CGGCAACTGG 

851 GCAAGCGTTC CATCAAACAA ACCGAAACTC TTGCCCGCCA GCCTTTGATT 

901 TTwGATTTTa ACGCCGCTTA CGAGCCGAAG AAAAACCTTA TTTTCCGCGC 

951 CGAAGTCAAA AATCTGTTCG ACAGGCGTTA TATCGATCCG CTCGATGCGG 

1001 GCAATGATGC GGCAAC . GAG CGTTATTACA GCTCGTTCGA CCCGAAAGAC 

1051 AAGGACrrAG ACGTAACGTG TAATGCTGAT AAAACGTTGT GCaACGGCAA 

1101 ATACGGCGGC ACAAGCAAAA GCGTATTGAC CAATTTTGCA CGCGGACGCA 

1151 CCTTTTTgAT GACGATGAGC TACAAGTTTT AA 

This corresponds to the amino acid sequence <SEQ ID 876; ORF133>: 

1 . . PGYYGSDDEF KRAFGENSPT XKKHCNRSCG IYEPVLKKYG KKRANNHSVS 

51 ISADFGDYFM PFASYSRTHR MPNIQEMYFS QIGDSGVHTA LKPERANTWQ 

101 FGFXTYKKGL LKQDDTLGLK LVGYRSRIDN YIHNVYGKWW DLNGDIPSWV 

151 SSTGLAYTIQ HRXFXDKVHQ XXXXXXXXYD YGRFFTNLSY AYQKSTQPTN 

201 FSDASESPNN ASKEDQLKQG YGLSRVSALP RDYGRLEVGT RWLGNKLTLG 

251 GAMRYFGKSI RATAEERYID GTNGGNTSNF RQLGKRSIKQ TETLARQPLI 

301 XDFNAAYEPK KNLIFRAEVK NLFDRRYIDP LDAGNDAAXE RYYSSFDPKD 

351 KDXDVTCNAD KTLCNGKYGG TSKSVLTNFA RGRTFLMTMS YKF* 

Further work revealed the further partial DNA sequence <SEQ ID 877>: 

1 GAGGCGCAGA TACAGGTTTT GGAAGATGTG CACGTCAAGG CGAAGCGCGT 

51 ACCGAAAGAC AAAAAAGTGT TTACCGATGC GCGTGCCGTA TCGACCCGTC 

101 AGGATATATT CAAATCCAGC GAAAACCTCG ACAACATCGT ACGCAGCATC 

151 CCCGGTGCGT TTACACAGCA AGATAAAAGC TCGGGCATTG TGTCTTTGAA 

201 TATTCGCGGC GACAGCGGGT TCGGGCGGGT CAATACGATG GTGGACGGCA 

251 TCACGCAGAC CTTTTATTCG ACTTCTACCG ATGCGGGCAG GGCAGGCGGT 
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10 



15 



20 



25 



30 



35 



40 



45 



301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
•1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 



TCATCTCAAT 

TGTCGTCAAA 

GTTCGGCGAA 

AATACCTACG 

AGGTAATGCG 

CATCTGTCGG 

TACCGCGTGG 

TTTGGAACGG 

TCAATTCCGA 

AAATACAAGC 

CGAAGAGCAT 

TTACCCCCAT 

TTTAAATTGG 

CGATTTAAAC 

AGTTCAATTA 

GCAGCCTACA 

AGGCTGGGGG 

TCGACCTCAA 

CAAACCACTT 

CTTTCCTGAA 

GGCTTTATTC 

CAAAAATCAA 

CTACTTCGAT 

CCAATACCGT 

TCGGATGACG 

GAAACATTGC 

ACGGCAAAAA 

GGGGATTATT 

CAACATCCAA 

CCGCCTTAAA 

TATAAAAAAG 

CGGCTACCGC 

GGTGGGATTT 

GCCTACACCA 

TTTTGAGTTG 

CTTACGCCTA 

GAATCGCCCA 

GTTGAGCAGG 

GTACGCGCTG 

TTCGGCAAGA 

CAACGGGGGA 

AACAAACCGA 

GCTTACGAGC 

GTTCGACAGG 

CGCAGCGTTA 

ACGTGTAATG 

CAAAAGCGTA 

TGAGCTACAA 



TCGGTGCATC 

GGCAGCTTCA 

TCTGCGGACT 

GCCTGCTGCT 

ATGGCGGCGA 

TGTGCTTTAC 

GCGGCGGCGG 

CGCAAGCAGC 

CAGCGGAAAA 

CGTATAAAAA 

GACAAAAGCT 

CGATCCGTCC 

AATACGACGG 

ACCAAAATCG 

CGGTTTGTCT 

ATTCGGGCAG 

CTTTTAAAGG 

CAACACCGCC 

TGGGCTTCAA 

GAATTGGGGC 

CTATTTGGGG 

CCATTGTCCA 

GCCGCGCTCA 

CGGCTACCGT 

AATTTAAGCG 

AACCGGAGCT 

GCGCGCCAAC 

TCATGCCGTT 

GAAATGTATT 

ACCAGAGCGC 

GATTGTTAAA 

AGCCGCATCG 

GAACGGGGAT 

TCCAACATCG 

GAGCTGAATT 

TCAAAAAAGC 

ACAATGCGTC 

GTTTCCGCCC 

GTTGGGCAAC 

GCATCCGCGC 

AATACCAGCA 

AACTCTTGCC 

CGAAGAAAAA 

CGTTATATCG 

TTACAGCTCG 

CTGATAAAAC 

TTGACCAATT 

GTTTTAA 



TGTCGACAGC 

GCGGCTCGGC 

TTAGGCGTGG 

AAAAGGTCTG 

TAGGTGCGCG 

GGGCACAGCA 

GCAGCACATC 

GATATTTTGT 

TGGGAGCGGG 

TTACAACAAC 

GGCGGGAAAA 

AGCCTGAAGC 

CGTATTCAAT 

GCAGCCGCAA 

TTGAACCCGT 

GCAGAAATAT 

ATTTTGAAAC 

ACCTTCCGGC 

TTATTTCCAC 

TGTTTTTCGA 

CGGTTTAAGG 

ACCGGCCGGC 

AAAAAGACAT 

TTCGGCGGCG 

GGCATTCGGA 

GCGGGATTTA 

AACCATTCGG 

CGCCAGCTAT 

TTTCCCAAAT 

GCAAACACTT 

ACAAGATGAT 

ACAACTACAT 

ATTCCGAGCT 

CAATTTCAAA 

ACGATTATGG 

ACGCAACCGA 

CAAAGAAGAC 

TGCCGCGAGA 

AAACTGACTT 

GACGGCTGAA 

ATTTCCGGCA 

CGCCAGCCTT 

CCTTATTTTC 

ATCCGCTCGA 

TTCGACCCGA 

GTTGTGCAAC 

TTGCACGCGG 



AATTTTATTG 

AGGCATCAAC 

ATGACGTCGT 

ACCGGCACCA 

CAAATGGCTG 

GGCGCAGCGT 

GGAAATTTTG 

ACAAGAGGGT 

ATTTACAAAG 

CAAGAACTAC 

CCTg . CaCCG 

AGCAGTCGGC 

AAATACACGG 

AATCATCAAC 

ATACCAACCT 

CCGAAAGGGT 

CTACAACAAC 

TGCCCCGCGA 

AACGAATACG 

CGGTCCTGAT 

GCGATAAAGG 

AGCCAATATT 

TTACCGCTTA 

AATATACGGG 

GAAAACTCGC 

TGAACCCGTA 

TCAGCATTAG 

TCGCGCACAC 

CGGCGACTCC 

GGCAATTTGG 

ACATTAGGAT 

CCACAACGTT 

GGGTCAGCAG 

GACAAAGTGC 

GCGTTTTTTC 

CCAACTTCAG 

CAACTCAAAC 

TTACGGACGT 

TGGGCGGCGC 

GAACGCTATA 

ACTGGGCAAG 

TGATTTTTGA 

CGCGCCGAAG 

TGCGGGCAAT 

AAGACAAGGA 

GGCAAATACG 

ACGCACCTTT 



CCGGACTGGA 

AGCCTTGCCG 

TCAGGGCAAT 

ATTCAACCAA 

GAAAGCGGAG 

GGCGCAAAAT 

GCGCGGAATA 

GCTTTGAAAT 

GCAACAGTGG 

AaAAATACAT 

CAATACGACA 

AGGCAATCTG 

CGCAATTTCG 

CGCAATTATC 

CAATCTGACC 

CGAAGTTTAC 

GCGAAAATCC 

AACCGAGTTG 

GCAAAAACCG 

CAGGACAACG 

GCTGCTGCCC 

TCAACACGTT 

AACTACAGCA 

CTATTACGGC 

CGACATACAA 

TTGAAAAAAT 

TGCGGACTTC 

ACCGTATGCC 

GGCGTTCACA 

CTTCAATACC 

TAAAACTGGT 

TACGGGAAAT 

CACCGGGCTT 

ACAAACACGG 

ACCAACCTTT 

CGATGCGAGC 

AAGGTTATGG 

TTGGAAGTCG 

GATGCGCTAT 

TCGACGGCAC 

CGTTCCATCA 

TTTTTACGCC 

TCAAAAATCT 

GATGCGGCAA 

CGAAGACGTA 

GCGGCACAAG 

TTGATGACGA 



This corresponds to the amino acid sequence <SEQ ED 878; ORF133-l>: 



50 



55 



60 



65 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



EAQIQVLEDV 
PGAFTQQDKS 
SSQFGASVDS 
NTYGLLLKGL 
YRVGGGGQHI 
KYKPYKNYNN 
FKLEYDGVFN 
AAYNSGRQKY 
QTTLGFNYFH 
QKSTIVQPAG 
SDDEFKRAFG 
GDYFMPFASY 
YKKGLLKQDD 
AYTIQHRNFK 
ESPNNASKED 
FGKSIRATAE 
AYEPKKNLIF 
TCNADKTLCN 



HVKAKRVPKD 
SGIVSLNIRG 
NFIAGLDWK 
TGTNSTKGNA 
GNFGAEYLER 
QELQKYIEEH 
KYTAQFRDLN 
PKGSKFTGWG 
NEYGKNRFPE 
SQYFNTFYFD 
ENSPTYKKHC 
SRTHRMPNIQ 
TLGLKLVGYR 
DKVHKHGFEL 
QLKQGYGLSR 
ERYIDGTNGG 
RAEVKNLFDR 
GKYGGTSKSV 



KKVFTDARAV 
DSGFGRVNTM 
GSFSGSAGIN 
MAAIGARKWL 
RKQRYFVQEG 
DKSWRENLXP 
TKIGSRKIIN 
LLKDFETYNN 
ELGLFFDGPD 
AALKKDIYRL 
NRSCGIYEPV 
EMYFSQIGDS 
SRIDNYIHNV 
ELNYDYGRFF 
VSALPRDYGR 
NTSNFRQLGK 
RYIDPLDAGN 
LTNFARGRTF 



STRQDIFKSS 
VDGITQTFYS 
SLAG SAN LRT 
ESGASVGVLY 
ALKFNSDSGK 
QYDITPIDPS 
RNYQFNYGLS 
AKILDLNNTA 
QDNGLYSYLG 
NYSTNTVGYR 
LKKYGKKRAN 
GVHTALKPER 
YGKWWDLNGD 
TNLSYAYQKS 
LEVGTRWLGN 
RSIKQTETLA 
DAATQRYYSS 
LMTMSYKF* 



ENLDNIVRSI 
TSTDAGRAGG 
LGVDDWQGN 
GHSRRSVAQN 
WERDLQRQQW 
SLKQQSAGNL 
LNPYTNLNLT 
TFRLPRETEL 
RFKGDKGLLP 
FGGEYTGrrG 
NHSVSISADF 
ANTWQFGFNT 
IPSWVSSTGL 
TQPTNFSDAS 
KLTLGGAMRY 
RQPLIFDFYA 
FDPKDKDEDV 



Computer analysis of this amino acid sequence gave the following results: 
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Homologv with with the probable TonB-dependent receptor HI121 of H.influenzae (accession number U32801) 
ORF133 and HI121 show 57% aa identity in 363aa overlap: 

IYEPVLKKYGKKRANNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTA 90 
I EP+L K G K+A NHS ++SA+ DYFMPF +YSRTHRMPN IQEM+ FSQ+ ++GV+TA 
INEPILHKSGHKKAFNHSATLSAELSDYFMPFFTYSRTHRMPNIQEMFFSQVSNAGVNTA 622 

LKPERANTWQFGFXTYKKGLLKQDDTLGLKLVGYRSRIDNYIHNVYGKWWDLNGDIPSWV 150 
LKPE+++T+Q GF TYKKGL QDD LG+KLVGYRS I NYIHNVYG WW +P+W 
LKPEQSDTYQLGFNTYKKGLFTQDDVLGVKLVGYRSFIKNYIHNVYGVWW — RDGMPTWA 680 

S S TGLAYT I QHRXFXDKVHXXXXXXXXXYD YGRFFTNLS YAYQKSTQPTNFSDASES PNN 210 

S G YTI H+ + V YD GRFF N+SYAYQ++ QPTN++DAS PNN 

ESNGFKYTIAHQNYKPIVKKSGVELEINYDMGRFFANVSYAYQRTNQPTNYADASPRPNN 740 

ASKE DQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKS I RATAEERYI D 270 
AS+ED LKQGYGLSRVS LP+DYGRLE+GTRW KLTLG A RY+GKS RAT EE YI+ 
ASQEDILKQGYGLSRVSMLPKDYGRLELGTRWFDQiCLTLGLAARYYGKSKRATIEEEYIN 800 

GTNGGNT SN FRQLGKRS I KQTETLARQPLIXDFNAAYE PKKNL I FRAE VKNLFDRRY I DP 330 
G+ + R+ ++K+TE + +QP+I D + +YEP K+LI +AEV+NL D+RY+DP 

GSR-FKKNTLRRENYYAVKKTEDIKKQPIILDLHVSYEPIKDLIIKAEVQNLLDKRYVDP 859 

LDAGNDAAXERYYSSFDPKDKDXDVTCNADKTLCNGKYGGTSKSVLTNFARGRTFL^TMS 3 90 
LDAGNDAA +RYYSS + + C D + C GG+ K+VL NFARGRT++++++ 
LDAGNDAASQRYYSSL NNSIECAQDSSAC GGS DKTVLYN FARGRT Y I LS LN 910 

YKF 393 

YKF | 
YKF 913 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF133 shows 90.8% identity over a 392aa overlap with an ORF (ORF133a) from strain A ofN. 
meningitidis: 



0rfl33: 


31 


HI121: 


563 


0rfl33: 


91 


HI121: 


623 


0rfl33: 


151 


HI121: 


681 


0rfl33: 


211 


HI121: 


741 


0rfl33: 


271 


HI121: 


801 


0rfl33: 


331 


HI121: 


860 


0rfl33: 


391 


HI121: 


911 



10 20 30 

orf 133 .pep PGYYGS DDE FKRAFGENS PTXKKHCNRSCGI 

ill I I I I I I I I I II I I I I t II 1:1111 
orf 133a FYFDAALKKD I YRLN Y STNTVGYRFGGX YTG YYXS DDE FKRAFGEN S PT YXKHCNQSCG I 

450 460 470 480 490 500 



40 50 60 70 80 90 

orf 133 . pep YEPVLKKYGKKRANNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTAL 
I I I I M I I I I I I I I I i I I M I I II II I I I I I I I I I I I M M 1 I I I I I I I I I I I I I 1 ! I I I 
orf 133a YEPVLKKYGKKRANNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTAL 
510 520 530 540 550 560 



100 110 120 130 140 150 

orf 133 . pep KPERANTWQFGFXTYKKGLLKQDDTLGLKLVGYRSRIDNYIHNVYGKWWDLNGDIPSWVS 

ill lllllllll llllll I MM I 1 I I I I I 1 1 I I I I MINI :MII! 

orf 133a KPERANTWQFGFNTYKKGLLKQDDILGLKLVGYRSRIDXYIHNVYGKWWDLNGNIPSWVS 
570 580 590 600 610 620 



160 170 180 190 200 210 

orf 133 . pep STGLAYT IQHRXFXDKVHQXXXXXXXXYDYGRFFTNLSYAYQKSTQPTNFSDASES PNNA 
I 1 I 1 I I t I I I I I MM: Ml II ( I I M II II II I I I II II M I I I I II I 

orf 133a STGLAYTIQHRNFKDKVHKHGFELELNYDYXRFFTNLSYAYQKSTQPTNFSDASESPNNA 
630 640 650 660 670 680 



220 230 240 250 260 270 

orf 133 . pep SKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKS IRATAEERYI DG 

M I I M I I II I I I II I M I I I I I I II I I II I I I II I I I I M I I I II I I II I I II I I I I I 
orf 133a SKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYIDX 
690 700 710 720 730 740 



280 290 300 310 320 330 

orf 133. pep TNGGNTSNFRQLGKRSIKQTETLARQPLIXDFNAAYEPKKNLIFRAEVKNLFDRRYIDPL 
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III I I I I I I I I It I I I I I II I I I I I I I I I I II I I I I I II I I I I I I I I M I II I 
orfl33a TNGXXTSNFRQLGKRS IXQTETLARQPLI FDXYAAYEPKKXLI FRAEVKNLFDRRYI DPL 

750 760 770 780 790 800 

340 350 360 370 380 390 

or f 133 . pep DAGNDAAXERYYSSFDPKDKDXDVTCNADKTLCNGKYGGTSKSVLTNFARGRTFLMTMSY 
I I I I I I I : : I II I I I I I I I I I : I I I I I : I I I I I I I I I I I I I II I I I I I I 111:1111 
orfl33a DAGNDAATQRYYSSFDPKDKDEEVTCNDDNTLCNGKYGGTSKSVLTNFARGXTFLITMSY 
810 820 830 840 850 860 



orfl33.pep 



orf 133a 



KFX 
I I I 
KFX 
870 



A partial ORF 1 33a nucleotide sequence <SEQ ID 879> is: 



1 AAAGACAAAA AAGTGTTTAC 

51 TATATTCAAA TCCANCGAAA 

101 GTGCGTTTAC ACANCAANAT 

151 CGCNGCGACA GCGGGTTCGG 

201 NCANACCTTT TATTCGACTT 

'251 CTCAATTCGG TGCATCTGTC 

301 GTCAAAGGCA GCTTCAGCGG 

351 GGCGAATCTG CGGACTTTAN 

401 CNTACGGCCT GCTGCTAAAA 

451 AATGCGATGG CGGCGATAGG 

501 TGTCGGTGTG CTTTACGGGC 

551 GCGTGGGCGG CGGCGGGCAG 

601 GAACGACGCA AGCAACGATA 

651 TTCCAACAGC GGAAAATGGG 

701 CCAAGTGGTA TCAAAAATAC 

751 GAAGGTCATG ATAAAAGCTG 

801 CACCCCCATC GATCCGTCCA 

851 TTAAATTGGA ATACGACGGC 

901 GATTTAAACA CCAAAATCGG 

951 ATTCAATTAC GGTTTGTCTT 

1001 CAGCCTACAA TTCGGGCAGG 

1051 GGCTGGGGGC TTTTNAAAGA 

1101 CGACCTCANC AACACCTCCA 

1151 AAACCACTTT GGGCTTCAAT 

1201 TTTCCTGAAG AATTGGGGCT 

1251 GCTTTATTCC TATTTGGGGC 

1301 AAAAATCAAC CATTGTCCAA 

1351 TACTTCGATG CCGCGCTCAA 

1401 CAATACCGTC GGCTACCGTT 

1451 CGGATGACGA ATTTAAGCGG 

1501 AAACATTGCA ACCAGAGCTG 

1551 CGGCAAAAAG CGCGCCAACA 

1601 GCGATTATTT CATGCCGTTC 

1651 AACATCCAAG AAATGTATTT 

1701 CGCCTTAAAA CCAGAGCGCG 

1751 ATAAAAAAGG ATTGTTAAAA 

1801 GGCTACCGCA GCCGCATCGA 

1851 GTGGGATTTG AACGGGAATA 

1901 CCTACACCAT CCAACACCGC 

1951 TTTGAGTTGG AGCTGAATTA 

2001 TTACGCCTAT CAAAAAAGCA 

2051 AATCGCCCAA CAATGCGTCC 

2101 TTGAGCAGGG TTTCCGCCCT 

2151 TACGCGCTGG TTGGGCAACA 

2201 TCGGCAAGAG CATCCGCGCG 

2251 AATGGGGNAN NTACCAGCAA 

2301 ACAAACCGAA ACCCTTGCCC 

2351 CTTACGAGCC GAAGAAAAAN 

2401 TTCGACAGGC GTTATATCGA 

2451 GCAGCGTTAT TACAGTTCGT 

2501 CGTGTAATGA TGATAACACG 

2551 AAAAGCGTAT TGACCAATTT 

2601 GAGCTACAAG TTTTAA 



CGATGCGCGT GCCGTATCGA 
ACCTCGACAA CATCGTACGC 
AAAAGCTCGG GCNTTGTGTC 
GCGGGTCAAT ACNATGGTNG 
CTACCGATGC GGGCAGGGCA 
GACAGCAATT TTATNGCCGG 
CTCGGCAGGC ATCAACAGCC 
GCGTGGATGA TGTCGTTCAG 
GGTCTGACCG GCACCAATTC 
TGCGCGCAAA TGGCTGGAAA 
ACAGCAGGCG CAGCGTGGCG 
CACATCGGAA ATTTTGGCGC 
TTTTGAGCAA GAAGGCGGGT 
AGCGGGATTT CCAAAAGTCG 
GATGCCCCCC AAGAACTGCA 
GCGGGAAAAC CTGGCGCCGC 
GCCTGAAGCN GCAGTCGGCA 
GTATTCAATA AATACACGGC 
CAGCCGCAAA ATCATCAACC 
TGAACCCGTA TACCAACCTC 
CAGAAATATC CGAAAGGGTC 
TTTTGAAACC TACAACAACG 
CCTTCCGGCT GCCCCGTGAA 
TATTTCCACA ACGAATACGG 
GTTTTTCGAC GGTCCGGATC 
GGTTTAAGGG CGATAAAGGG 
CCGGCCGGCA GCCAATATTT 
AAAAGACATT TACCGCTTAA 
TCGGCGGCNA ATATACGGGC 
GCATTCGGAG AAAACTCGCC 
CGGAATTTAT GAACCCGTAT 
ACCATTCGGT CAGCATTAGT 
GCCAGCTATT CGCGCACACA 
TTCCCAAATC GGCGACTCCG 
CAAACACTTG GCAATTTGGC 
CAAGATGATA TATTAGGATT 
CNACTACATC CACAACGTTT 
TTCCGAGCTG GGTCAGCAGC 
AATTTCAAAG ACAAAGTGCA 
CGATTATNGG CGTTTTTTCA 
CGCAACCGAC CAACTTCAGC 
AAAGAAGACC AACTCAAACA 
GCCGCGAGAT TACGGACGTT 
AACTGACTTT GGGCGGCGCG 
ACGGCTGAAG AACGCTATAT 
TTTCCGGCAA CTGGGCAAGC 
GCCAGCCTTT GATTTTTGAT 
CTTATTTTCC GCGCCGAAGT 
TCCGCTCGAT GCGGGCAATG 
TCGACCCGAA AGACAAGGAC 
TTATGCAACG GCAAATACGG 
TGCACGCGGA CNCACCTTTT 



CCCGTCAGGA 
ANCATCCCCG 
TTTGAATATT 
ACGGCATCAC 
GGCGGTTCAT 
ACTGGATGTC 
TTGCCGGTTC 
GGCAATANTA 
AACCAAAGGT 
GCGGAGCATC 
CAAAATTACC 
GGAATATCTG 
TGAAATTCAA 
TACTGGAAAA 
AAAATACATC 
AATACGACAT 
GGCAACCTGT 
GCAATTTCGC 
GCAATTATCA 
AATCTGACCG 
GAAGTTTACA 
CAAAAATCCT 
ACCGAGTTGC 
CAAAAACCGC 
ANGACAACGG 
CTGCTGCCCC 
CAACACGTTC 
ACTACAGCAC 
TATTACNGCT 
GACATACANG 
TGAAAAAATA 
GCGGACTTCG 
CCGTATGCCC 
GCGTTCACAC 
TTCAATACCT 
AAAACTGGTC 
ACGGGAAATG 
ACCGGGCTTG 
CAAACACGGT 
CCAACCTTTC 
GATGCGAGCG 
AGGTTATGGG 
TGGAAGTCGG 
ATGCGCTATT 
CGACGNCACC 
GTTCCATCAN 
TTNTACGCCG 
CAAAAATCTG 
ATGCGGCAAC 
GAAGAAGTAA 
CGGCACAAGC 
TGATAACGAT 
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This encodes a protein having (partial) amino acid sequence <SEQ ED 880>: 

1 KDKKVFTDAR AVSTRQDIFK SXENLDNIVR XIPGAFTXQX KSSGXVSLNI 

51 RXDSGFGRVN TMVDGITXTF YSTSTDAGRA GGSSQFGASV DSNFXAGLDV 

101 VKGSFSGSAG INS LAGS ANL RTLXVDDWQ GNXTYGLLLK GLTGTNSTKG 

151 NAMAAIGARK WLESGASVGV LYGHSRRSVA QNYRVGGGGQ HIGNFGAEYL 

201 ERRKQRYFEQ EGGLKFNSNS GKWERDFQKS YWKTKWYQKY DAPQELQKYI 

251 EGHDKSWREN LAPQYDITPI DPSSLKXQSA GNLFKLEYDG VFNKYTAQFR 

301 DLNTKIGSRK IINRNYQFNY GLSLNPYTNL NLTAAYNSGR QKYPKGSKFT 

351 GWGLXKDFET YNNAKILDLX NTSTFRLPRE TELQTTLGFN YFHNEYGKNR 

401 FPEELGLFFD GPDXDNGLYS YLGRFKGDKG LLPQKSTIVQ PAGSQYFNTF 

451 YFDAALKKDI YRLNYSTNTV GYRFGGXYTG YYXSDDEFKR AFGENSPTYX 

501 KHCNQSCGIY EPVLKKYGKK RANNHSVSIS ADFGDYFMPF ASYSRTHRMP 

551 NIQEMYFSQI GDSGVHTALK PERANTWQFG FNTYKKGLLK QDDILGLKLV 

601 GYRSRIDXYI HNVYGKWWDL NGNIPSWVSS TGLAYTIQHR NFKDKVHKHG 

651 FELELNYDYX RFFTNLSYAY QKSTQPTNFS DASESPNNAS KEDQLKQGY€ 

701 LSRVSALPRD YGRLEVGTRW LGNKLTLGGA MRYFGKSIRA TAEERYIDXT 

751 NGXXTSNFRQ LGKRSIXQTE TLARQPLIFD XYAAYEPKKX LIFRAEVKNL 

801 FDRRYIDPLD AGNDAATQRY YSSFDPKDKD EEVTCNDDNT LCNGKYGGTS 

851 KSVLTNFARG XTFLITMSYK F* 

ORF133a and ORF133-1 show 94.3% identity in 871 aa overlap: 

10 20 30 40 

orf 133a . pep KDKKVFT DARAVSTRQDIFKSXENLDNIVRXI PGAFTXQXKS 

I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I 'M I II 
orfl33-l EAQ IQVLEDVHVKAKRVPKDKKVFT DARAVSTRQDI FKS SENLDNI VRS I PGAFTQQDKS 

10 20 30 40 50 60 

50 60 70 80 90 100 

orf 133a . pep SGXVSLNIRXDSGFGRVNTMVDGITXTFYSTSTDAGRAGGSSQFGASVDSNFXAGLDWK 
II MINI I I I I I I I I I I I II I I I II I II I I II I I I II I I I II I II I I I I I I t I I I 
orf 133-1 SGIVSLNIRGDSGFGRVNTMVDGITQTFYSTSTDAGRAGGSSQFGASVDSNFIAGLDWK 

70 80 90 100 110 120 

110 120 130 140 150 160 

or f 133a . pep GSFSGSAG INS LAGS ANLRTLXVDDWQGNXTYGLLLKGLTGTNSTKGNAMAAIGARKWL 
II INI ill III IMIII I II I IN III! II II II I II III I II II I I III MUM I 
orf 133-1 GSFSGSAGINSLAGSANLRTLGVDDWQGNNTYGLLLKGLTGTNSTKGNAMAAIGARKWL 

130 140 150 160 170 180 

170 180 190 200 210 220 

orf 133a . pep ESGASVGVLYGHSRRSVAQNYRVGGGGQHIGNFGAEYLERRKQRYFEQEGGLKFNSNSGK 
M I II I II I I I M II I I I I I I I I I II I M II II I I I I II II II I II I I I : I II II : M I 
orf 133-1 ESGASVGVLYGHSRRSVAQNYRVGGGGQHIGNFGAEYLERRKQRYFVQEGALKFNSDSGK 

190 200 210 220 230 240 

230 240 250 260 270 280 

orf 133a . pep WERDFQKS YWKTKWYQKYDAPQELQKY I EGHDKS WRENLAPQYD I T P I D PS S LKXQS AGN 
1 1 | I : I : : II I I : : I : II I II I I I I II II I II I I I I I II I I I I I I II I I I I I 
orf 133-1 WERDLQRQQWKYKPYKNYNN-QELQKYIEEHDKSWRENLXPQYDITPIDPSSLKQQSAGN 

250 260 270 280 290 

290 300 310 320 330 340 

orf 133a . pep LFKLEYDGVFNKYTAQFRDLNTKIGSRKIINRNYQFNYGLSLNPYTNLNLTAAYNSGRQK 
II I II I M M 1 1 II II I I I 1 M II I I II I I II I I I M I I I I I I 1 I i M I I I I I I II I i II 
orf 133-1 LFKLEYDGVFNKYTAQFRDLNTKIGSRKIINRNYQFNYGLSLNPYTNLNLTAAYNSGRQK 
300 310 320 330 340 350 

350 360 370 380 390 400 

orf 133a . pep YPKGSKFTGWGLXKDFETYNNAKILDLXNTSTFRLPRETELQTTLGFNYFHNEYGKNRFP 
IIIMIMIMI 1 1 I I f I I I I 1 I I I I M : I II M I I I I II II M I M I II II M I I M 
orf 133-1 YPKGSKFTGWGLLKDFETYNNAKILDLNNTATFRLPRETELQTTLGFNYFHNEYGKNRFP 
360 370 380 390 400 410 

410 420 430 440 450 460 

orf 133a . pep EELGLFFDGPDXDNGLYSYLGRFKGDKGLLPQKSTIVQPAGSQYFNTFYFDAALKKDIYR 
MIMIIIIM 1 H I 1 1 I I I It 1 1 I i M I t I I M M I M I 1 1 I t t i I I I 1 1 11 ! M I I I 
orfl33-l EELGL FFDGPDQDNGLYSYLGRFKGDKGLLPQKST I VQPAGSQYFNT FYFDAALKKDI YR 

420 430 440 450 460 470 
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10 



15 



20 



25 



30 



35 



40 



470 480 490 500 510 520 

orf 133a . pep LNYSTNTVGYRFGGXYTGYYXSDDEFKRAFGENSPTYXKHCNQSCGIYEPVLKKYGKKRA 
Mil llllllllll I MM I 1 1 I I I I I I t I I f I I I II I I: I II II I I I I II I Mill 
orf 133-1 LNYSTNTVGYRFGGEYTGYYGSDDEFKRAFGENSPTYKKHCNRSCGIYEPVLKKYGKKRA 
480 490 500 510 520 530 

530 540 550 560 570 580 

orf 133a . pep NNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTALKPERANTWQFGFN 
I I II II I II 1 1 II II 1 1 1 1 M I 1 1 II 1 1 1 1 M I II I M 1 1 1 1 1 1 I II II 1 1 II II II M I 
orf 133-1 NNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTALKPERANTWQFGFN 
540 550 560 570 580 590 

590 600 610 620 630 640 

orf 133a . pep T YKKGLLKQDDI LGLKLVGYRSRI DXYIHNVYGKWWDLNGNI PSWVS STGLAYTIQHRNF 
llllllllll! MM MM Mill II I II II I I III II :| I I II Ml II Ml II I Ml 
orf 133-1 TYKKGLLKQDDTLGLKLVGYRSRI DNYIHNVYGKWWDLNGDI PSWVS STGLAYTIQHRNF 

600 610 620 630 640 650 

650 660 670 680 690 700 

orf 133a . pep KDKVHKHGFELELNYDYXRFFTNLSYAYQKSTQPTNFSDASESPNNASKEDQLKQGYGLS 
Ml Ml M II I I I II M M III II I II II I Ml III I Ml II I II M III I Ml II Mi 
orf 133-1 KDKVHKHGFELELNYDYGRFFTNLSYAYQKSTQPTNFSDASESPNNASKEDQLKQGYGLS 
660 670 680 690 700 710 

710 720 730 740 750 760 

orf 133a . pep RVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKS I RATAEERYI DXTNGXXTSNFRQLG 
I 1 1 M I 1 I I I M I M I I I I I I I II I I I M 1 1 1 1 I I II M M I I I I I III lllllll I 
orf 133-1 RVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYIDGTNGGNTSNFRQLG 
720 730 740 750 760 770 

770 780 790 800 810 820 

orf 133a . pep KRS IXQTETLARQPLI FDXYAAYE PKKXLI FRAEVKNLFDRRYI DPLDAGNDAATQRYYS 
I II I III I II III Mi I M III II I II M Ml II II Ml M I I I II I II II II I I M 
orf 133-1 KRSIKQTETLARQPLIFDFYAAYEPKKNLI FRAEVKNLFDRRYI DPLDAGNDAATQRYYS 

780 790 800 810 820 830 

830 840 850 860 870 

orf 133a . pep SFDPKDKDEEVTCNDDNTLCNGKYGGTSKSVLTNFARGXTFLITMSYKFX 
M II I I I M : II M I : II I I I I I I I I I I I I I I I Ml I MIMIIIIII 
orf 133-1 SFDPKDKDEDVTCNADKTLCNGKYGGTSKSVLTNFARGRTFLMTMSYKFX 
840 850 860 870 880 



45 



50 



55 



60 



65 



Homology with a predicted ORF from A Gonorrhoeae 

ORF133 shows 92.3% identity over 392 aa overlap with a predicted ORF (ORF133ng) from K 
gonorrhoeae: 



orf 133. pep 
orfl33ng 
orf 133. pep 
orfl33ng 
orf 133. pep 
orfl33ng 
orf 133. pep 
orfl33ng 
orf 133. pep 
orfl33ng 
orf 133. pep 
orfl33ng 



PG YYGS DDE FKRAFGEN S PTXKKHCNRS CG I 
I II II : M I I II II I I II : MM: III: 
FYFDAALKKDIYRLNYSTNAINYRFGGEYTGYYGSENEFKRAFGENSPAYKEHCDPSCGL 



SKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYIDG 
I II i II II II I M I I II I I II I II I I II II II II II II II II M II II 11 II II M II I I 
SKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYIDG 



31 
560 



91 



YEPVLKKYGKKRANNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTAL 
I I I I I I I I I I I I J t I I t I I I I 1 1 I 1 I f 1 I I I 1 : I I I I I I I I 1 I I I I I I I I i | | | | 1 | | t I 
YEPVLKKYGKKRANNHSVSISADFGDYFMPFAGYSRTHRMPNIQEMYFSQIGDSGVHTAL 620 

KPERANTWQFGFXTYKKGLLKQDDTLGLKLVGYRSRI DNYIHNVYGKWWDLNGDI PSWVS 151 
I I I I I I I I M I I lllllllllll I I I I I 1 I I I I J I I I I I I I I I I I I I I I I I I I I I I i r 
KPERANTWQFGFNTYKKGLLKQDDILGLKLVGYRSRIDNYIHNVYGKWWDLNGDIPSWVG 680 

STGLAYTIQHRXFXDKVHQXXXXXXXXYDYGRFFTNLSYAYQKSTQPTNFSDASESPNNA 211 
i II II i II : I I I MM: M II II I I I II I II I Ml II II II II II I II I I 

STGLAYTIRHRNFKDKVHKHGFELELNYDYGRFFTNLSYAYQKSTQPTNFSDASESPNNA 740 



271 



800 
331 



TNGGNTSNFRQLGKRSIKQTETLARQPLIXDFNAAYEPKKNLI FRAEVKNLFDRRYI DPL 
I 1 1 1 I 1 1 1 1 1 1 I I I 1 1 1 I i I I 1 1 I 1 1 1 1 M 1 1 1 1 I 1 1 I I I I I I I I I I I I 1 i I I I 1 1 1 
TNGGNTSNVRQLGKRSIKQTETLARQPLIFDFYAAYEPKKNLI FRAEVKNLFDRRYI DPL 860 
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orf 133 . pep DAGNDAAXERY YS S FDPKDKDXDVTCNADKTLCNGKYGGT SKS VLTN FARGRT FLMTMS Y 391 

I Ml I M::l M I I I III Ml M I I I t 1 1 I I I i I I I I I I i M I i M i I I I I I I I I I I I I 
orfl33ng DAGNDAATQRYYSSFDPKDKDEDVTCNADKTLCNGKYGGTSKSVLTNFARGRTFLMTMSY 920 



KF 393 
I I 

KF 922 



orf 133. pep 
orf 133ng 

The complete length ORF133ng nucleotide sequence <SEQ ID 881> is predicted to encode a 
protein having amino acid sequence <SEQ ID 882>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 



MRSSFRLKPI 

PKDKKVFTDA 

IRGDSGFGRV 

WKGSFSGSA 

GNAMAAIGAR 

LERRKQQYFV 

IEEHDKSWRE 

RDLNTRIGSR 

TGWGLLKDFE 

RFPEELGLFF 

FYFDAALKKD 

KEHCDPSCGL 

PNIQEMYFSQ 

VGYRSRIDNY 

GFELELNYDY 

GLSRVSALPR 

TNGGNTSNVR 

LFDRRYIDPL 

SKS VLTN FAR 



CFYLMGVMLY 
RAVSTRQDVF 
NTMVDGITQT 
GINSLAGSAN 
KWLESGASVG 
QEGGLKFNAG 
NLAPQYDITP 
KIINRNYQFN 
TYNNAKILDL 
DGPDQDNGLY 
IYRLNYSTNA 
YEPVLKKYGK 
IGDSGVHTAL 
IHNVYGKWWD 
GRFFTNLSYA 
DYGRLEVGTR 
QLGKRSIKQT 
DAGNDAATQR 
GRT FLMTMS Y 



HHSYAEDAGR 
KSGENLDNIV 
FYSTSTDAGR 
LRTLGVDDW 
VLYGHSRRGV 
SGKWERDLQR 
IDPSGLKQQS 
YGLSLNPYTN 
NNTATFRLPR 
SYLGRFKGDK 
INYRFGGEYT 
KRANNHSVS I 
KPERANTWQF 
LNGDIPSWVG 
YQKSTQPTNF 
WLGNKLTLGG 



AGSEAQIQVL 
RSIPGAFTQQ 
AGGSSQFGAS 
QGNNTYGLLL 
AQNYRVGGGG 
QYWKTKWYKK 
AGNLLNLEYD 
LNLTAAYNSG 
ETELQTTLGF 
GLLPQKSTIV 
GYYGSENEFK 
SADFGDYFMP 
GFNTYKKGLL 
STGLAYTIRH 
SDASESPNNA 
AMRYFGKSIR 



ETLARQPLIF 
YYSSFDPKDK 
KF* 



DFYAAYEPKK 
DEDVTCNADK 



EDVHVKAKRV 
DKSSGIVSLN 
VDSNFIAGLD 
KGLTGTNSTK 
QHIGNFGEEY 
YEDPQELQKY 
GVFNKYTAQF 
RQKYPKGAKF 
NYFHNEYGKN 
QPAGSQYFNT 
RAFGENSPAY 
FAGYSRTHRM 
KQDDILGLKL 
RNFKDKVHKH 
SKEDQLKQGY 
ATAEERYIDG 
NLIFRAEVKN 
TLCNGKYGGT 



A variant was also identified, being encoded by the gonococcal DNA sequence <SEQ ID 883>: 



1 ATGAGATCTT CTTTCCGGTT GAAGCCGATT TGTTTTTATC TTATGGGTGT 

51 TATGCTATAT CATCATAGTT ATGCCGAAGA TGCAGGGCGC GCGGGCAGCG 

101 AGGCGCAGAT ACAGGTTTTG GAAGATGTGC ACGTCAAGGC GAAGCGCGTA 

151 CCGAAAGACA AAAAAGTGTT TACCGATGCG CGTGCCGTAT CGACCCGTca 

201 gGATGTGTTC AAATCCGGCG AAAACCTCGA CAACATCGTA CGCAGCATAC 

251 CCGGTGCGTT TACACAGCAA GATAAAAGCT CGGGCATTGT GTCTTTGAAT 

301 ATTCGCGGCG ACAGCGGGTT CGGGCGGGTC AATACGATGG TGGACGGCAT 

351 CACGCAGACC TTTTATTCGA CTTCTACCGA TGCGGGCAGG GCAGGCGGTT 

401 CATCTCAATT CGGTGCATCT GTCGACAGCA ATTTTATTGC CGGACTGGAT 

451 GTCGTCAAAG GCAGCTTCAG CGGCTCGGCA GGCATCAACA GCCTTGCCGG 

501 TTCGGCGAAT CTGCGGACTT TAGGCGTGGA TGACGTCGTT CAGGGCAATA 

551 ATACCTACGG CCTGCTGCTA AAAGGTCTGA CCGGCACCAA TTCAACCAAA 

601 GGTAATGCGA TGGCGGCGAT AGGTGCGCGC AAATGGCTGG AAAGCGGAGC 

651 GTCTGTCGGT GTGCTTTACG GGCACAGCAG GCGCGGCGTG GCGCAAAATT 

701 ACCGCGTGGG CGGCGGCGGG CAGCACATCG GAAATTTTGG TGAAGAATAT 

751 CTGGAACGGC GCAAACAGCA ATATTTTGTA CAAGAGGGTG GTTTGAAATT 

801 CAATGCCGGC AGCGGAAAAT GGGAACGGGA TTTGCAAAGG CAATACTGGA 

851 AAACAAAGTG GTATAAAAAA TACGAAGACC CCCAAGAACT GCAAAAATAC 

901 ATCGAAGAGC ATGATAAAAG CTGGCGGGAA AACCTGGCGC CGCAATACGA 

951 CATCACCCCC ATCGATCCGT CCGGCCTGAA GCAGCAGTCG GCAGGCAATC 

1001 TGTTTAAATT GGAATACGAC GGCGTATTCA ATAAATACAC GGCGCAATTT 

1051 CGCGATTTAA ACACCAGAAT CGGCAGCCGC AAAATCATCA ACCGCAATTA 

1101 TCAATTCAAT TACGGTTTGT CTTTGAACCC GTATACCAAC CTCAATCTGA 

1151 CCGCAGCCTA CAATTCGGGC AGGCAGAAAT ATCCGAAAGG GGCGAAGTTT 

1201 ACAGGCTGGG GGCTTTTAAA AGATTTTGAA ACCTACAACA ACGCGAAAAT 

1251 CCTCGACCTC AACAACACCG CCACCTTCCG GCTGCCCCGC GAAACCGAGT 

1301 TGCAAACCAC TTTGGGCTTC AATTATTTCC ACAACGAATA CGGCAAAAAC 

1351 CGCTTTCCTG AAGAATTGGG GCTGTTTTTC GACGGTCCTG ATCAGGACAA 

1401 CGGGCTTTAT TCCTATTTGG GGCGGTTTAA GGGCGATAAA GGGCTGTTGC 

1451 CTCAAAAATC AACCATTGTC CAACCGGCCG GCAGCCAATA TTTCAACACG 

1501 TTCTACTTCG ATGCCGCGCT CAAAAAAGAC ATTTACCGCT TAAACTACAG 

1551 CACCAATGCA ATCAACTACC GTTTCGGCGG CGAATATACG GGCTATTACG 

1601 GCTCGGAAAA CGAATTTAAG CGGGCATTCG GAGAAAACTC GCCGGCATAC 

1651 AAGGAACATT GCGACCCGAG CTGCGGGCTT TATGAACCCG TATTGAAAAA 

1701 ATACGGCAAA AAGCGCGCCA ACAACCATTC GGTCAGCATT AGTGCGGACT 

1751 TCGGCGATTA TTTCATGCCG TTCGCCGGCT ATTCGCGCAC ACACCGTATG 
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1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 
2701 
2751 



CCCAACATCC 

CACCGCCTTA 

CCTATAAAAA 

GTCGGCTACC 

ATGGTGGGAT 

TTGCCTACAC 
GGTTTTGAGC 

TTCTTACGCC 
GCGAATCGCC 
GGGCTGAGCA 
CGGTACGCGC 
ATTTCGGCAA 
ACCAACGGGG 
CAAACAAACC 
CCGCTTACGA 
CTGTTCGACA 
AACGCAGCGT 
TAACGTGTAA 
AGCAAAAGCG 
GATGAGCTAC 



AAGAAATGTA 
AAACCAGAGC 
AGGATTGTTA 
GCAGCCGCAT 
TTGAACGGGG 
CATCCGACAC 
TGGAGCTGAA 
TATCAAAAAA 
CAACAATGCC 
GGGTTTCCGC 
TGGTTGGGCA 
GAGCATCCGC 
GAAATACCAG 
GAAACCCTTG 
GCCGAAGAAA 
GGCGTTATAT 
TATTACAGCT 
TGCTGATAAA 
TATTGACCAA 
AAGTTTTAA 



TTTTTCCCAA 
GCGCAAACAC 
AAACAAGATG 
TGACAACTAC 
ATATTCCGAG 
CGCAATTTCA 
TTACGATTAT 
GCACGCAACC 
tccaaAGAAG 
CCTGCCGCGA 
ACAAACTGAC 
GCGACGGCTG 
CAATGTCCGG 
CCCGACAGCC 
AACCTTATTT 
CGATCCGCTC 
CGTTCGACCC 
ACGTTGTGCA 
TTTCGCACGC 



ATCGGCGACT 
TTGGCAATTT 
ATATATTAGG 
ATCCACAACG. 
CTGGGTCGGC 
AAGACAAAGT 
GGGCGTTTTT 
GACCAATTTC 
ACCAACTCAA 
GATTACGGAC 
TTTGGGCGGC 
AAGAACGCTA 
CAACTGGGCA 
TTTGATTTTT 
TCCGCGCCGA 
GATGCGGGCA 
GAAAGACAAG 
ACGGCAAATA 
GGACGCACCT 



CCGGCGTTCA 
GGCTTCAATA 
ATTGAAACTG 
TTTACGGGAA 
AGCACCGGGC 
GCACAAACAC 
TCACCAACCT 
AGCGATGCGA 
ACAAGGTTAT 
GTTTGGAAGT 
GCGAtgcGCT 
TATCGACGGC 
AGCGTTCCAT 
GATTTTTACG 
AGTCAAAAAC 
ATGATGCGGC 
GACGAAGACG 
CGGCGGCACA 
TCTTGATGAC 



This corresponds to the amino acid sequence <SEQ ID 884; ORF133ng.l>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 



MRSSFRLKPI CFYLMGVMLY HHSYAEDAGR 



PKDKKVFTDA 
IRGDSGFGRV 
WKGSFSGSA 
GNAMAAIGAR 
LERRKQQYFV 
IEEHDKSWRE 
RDLNTRIGSR 
TGWGLLKDFE 
RFPEELGLFF 
FYFDAALKKD 
KEHCDPSCGL 
PNIQEMYFSQ 
VGYRSRIDNY 
GFELELNYDY 
GLSRVSALPR 
TNGGNTSNVR 
LFDRRYIDPL 
SKSVLTNFAR 



RAVSTRQDVF 
NTMVDGITQT 
GINSLAGSAN 
KWLESGASVG 
QEGGLKFNAG 
NLAPQYDITP 
KIINRNYQFN 
TYNNAKILDL 
DGPDQDNGLY 
IYRLNYSTNA 
YEPVLKKYGK 
IGDSGVHTAL 
IHNVYGKWWD 
GRFFTNLSYA 
DYGRLEVGTR 
QLGKRSIKQT 
DAGNDAATQR 
GRTFLMTMSY 



KSGENLDNIV 
FYSTSTDAGR 
LRTLGVDDW 
VLYGHSRRGV 
SGKWERDLQR 
IDPSGLKQQS 
YGLSLNPYTN 
NNTATFRLPR 
SYLGRFKGDK 
INYRFGGEYT 
KRANNHSVSI 
KPERANTWQF 
LNGDIPSWVG 
YQKSTQPTNF 
WLGNKLTLGG 
ETLARQPLIF 
YYSSFDPKDK 
KF* 



AGSEAQIQVL 
RSIPGAFTQQ 
AGGSSQFGAS 
QGNNTYGLLL 
AQNYRVGGGG 
QYWKTKWYKK 
AGNLFKLEYD 
LNLTAAYNSG 
ETELQTTLGF 
GLLPQKSTIV 
GYYGSENEFK 
SADFGDYFMP 
GFNTYKKGLL 
STGLAYTIRH 
SDASESPNNA 
AMRYFGKSIR 
DFYAAYEPKK 
DEDVTCNADK 



EDVHVKAKRV 
DKSSGIVSLN 
VDSNFIAGLD 
KGLTGTNSTK 
QHIGNFGEEY 
YEDPQELQKY 
GVFNKYTAQF 
RQKYPKGAKF 
NYFHNEYGKN 
QPAGSQYFNT 
RAFGENSPAY 
FAGYSRTHRM 
KQDDILGLKL 
RNFKDKVHKH 
SKEDQLKQGY 
ATAEERYIDG 
NLIFRAEVKN 
TLCNGKYGGT 



ORF133ng-l and ORF133-1 show 96.2% identity in 889 aa overlap: 



10 20 30 40 50 60 

orf 133ng-l . pep S FRLKP I CFYLMGVMLYHHS YAE DAGRAGSEAQIQVLEDVHVKAKRVPKDKKVFTDARAV 

I I I I I I I I I I I i I I I I I I I I i I I I I t I I I I 
orfl33-l EAQIQVLEDVHVKAKRV PKDKKV FTDARAV 

10 20 30 



70 80 90 100 110 120 

orfl33ng-l.pep STRQDVFKSGENLDNIVRSIPGAFTQQDKSSGIVSLNIRGDSGFGRVNTMVDGITQTFYS 
I I I I I : I I I : I II I I I I I I I I I I I I I I I I I I It I I I I I I I I I I I I I I I I I I I I I I I II I I 
orf 133-1 STRQDIFKSSENLDNIVRSIPGAFTQQDKSSGIVSLNIRGDSGFGRVNTMVDGITQTFYS 

40 50 60 70 80 90 



130 140 150 160 170 180 

or f 133ng-l . pep TSTDAGRAGGSSQFGASVDSNFIAGLDWKGSFSGSAGINSLAGSANLRTLGVDDWQGN 
I I I I I I I I I I 1 I I II I I I I I I I I I I I II I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 133-1 TSTDAGRAGGSSQFGASVDSNFIAGLDWKGSFSGSAG INS LAGS ANLRTLGVDDWQGN 

100 110 120 130 140 150 

190 200 210 220 230 240 

or f 133ng-l . pep NTYGLLLKGLTGTNSTKGNAMAAIGARKWLESGASVGVLYGHSRRGVAQNYRVGGGGQHI 
I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II: I I I I I I I I I I I I I I 
or f 1 33-1 NTYGLLLKGLTGTNSTKGNAMAAIGARKWLESGASVGVLYGHSRRSVAQNYRVGGGGQHI 

160 170 180 190 200 210 

250 260 270 280 290 300 

orf 133ng-l . pep GNFGEEYLERRKQQYFVQEGGLKFNAGSGKWERDLQRQYWKTKWYKKYEDPQELQKYIEE 
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-481- 

IMI I I I I 1 1 I i: I 1 1 1 1 1: I I t h Mllitllill !( I MIIIIMI 
orf 133-1 GNFGAEYLERRKQRYFVQEGALKFNSDSGKWERDLQRQQWKYKPYKNYNN-QELQKYIEE 

220 230 240 250 260 

310 320 330 340 350 360 

orf 133na-l . pep HDKSWRENLAPQYDITPIDPSGLKQQSAGNLFKLEYDGVFNKYTAQFRDLNTRIGSRKII 

I | M I I I I I I I I I M I I I I I : II I I I I M I M I I I I I I I I I I I I I I I I I M : I I 

orf 133-1 HDKSWRENLXPQYDITPIDPSSLKQQSAGNLFKLEYDGVFNKYTAQFRDLNTKIGSRKII 
270 280 290 300 310 320 

370 380 390 400 410 420 

orf 133na-l .pep NRNYQFN YGLSLNPYTNLNLTAAYN SGRQKYPKGAKFTGWGLLKDFET YNNAKI LDLNNT 

I I I 1 I I I I I I I ! t I I 1 1 I I t I I I I 1 I I I 1 I I I t I * I I t I I K 1 I I I K I 1 I I i I I I I I 1 I I I 
or f 1 33- 1 NRNYQFNYGLSLNPYTNLNLTAAYNSGRQKYPKGSKFTGWGLLKDFETYNNAKI LDLNNT 

330 340 350 360 370 380 

430 440 450 460 470 ' 480 

orf 133na-l . pep ATFRLPRETELQTTLGFNYFHNEYGKNRFPEELGLFFDGPDQDNGLYSYLGRFKGDKGLL 

II I I I I IMIMI I II I Mil II Ml I M MM! MUMIIMMMIt II I lilllll 
orf 133-1 ATFRLPRETELQTTLGFNYFHNEYGKNRFPEELGLFFDGPDQDNGLYSYLGRFKGDKGLL 

390 400 410 420 430 440 

490 500 510 520 530 540 

orf 133ng-l .pep PQKSTIVQPAGSQYFNTFYFDAALKKDIYRLNYSTNAINYRFGGEYTGYYGSENEFKRAF 
M M M MIIIIIMI MM Mi M MIIIIMI ll:::IMI MIIIM IMMIMII 
orf 133-1 PQKSTIVQPAGSQYFNTFYFDAALKKDIYRLNYSTNTVGYRFGGEYTGYYGSDDEFKRAF 
450 460 470 480 490 500 

550 560 570 580 590 600 ' 

orfl33ng-l.pep GENSPAYKEHCDPSCGLYEPVLKKYGKKRANNHSVSISADFGDYFMPFAGYSRTHRMPNI 
MIM:||:||: 1 i I : 1 I I I II I I I II I I I I I I I M I M M I i I I I I I : I I M I I I I I I 
orf 133-1 GENSPTYKKHCNRSCGIYEPVLKKYGKKRANNHSVSISADFGDYFMPFASYSRTHRMPNI 
510 520 530 540 550 560 

610 620 630 640 650 660 

orf 133ng-l . pep QEMYFSQIGDSGVHTALKPERANTWQFGFNTYKKGLLKQDDILGLKLVGYRSRIDNYIHN 
I I I I I I t I I I I t I I I I 1 1 I I I I 1 I 1 t I I I I ! 1 I I I I I I 1 I 1 IIMIIIII1IIIIIMI 
orf 133-1 QEMYFSQIGDSGVHTALKPERANTWQFGFNTYECKGLLKQDDTLGLKLVGYRSRIDNYIHN 
570 580 590 600 610 620 

670 680 690 700 710 720 

orf 133ng-l .pep VYGKWWDLNGDIPSWVGSTGLAYTIRHRNFKDKVHKHGFELELNYDYGRFFTNLSYAYQK 
I I I I 1 1 I 1 I I I I 1 1 1 I : I I I I I I I I : I I I I I I I I I 1 I I I I I I I ! M I 1 I I I I I I I I t I I I 
orf 133-1 VYGKWWDLNGDIPSWVSSTGLAYTIQHRNFKDKVHKHGFELELNYDYGRFFTNLSYAYQK 
630 640 650 660 670 680 

730 740 750 760 770 780 

orf 133ng-l .pep S TQPTNFS DASE SPNNAS KE DQLKQGYGLSRVS ALPRDYGRLEVGTRWLGNKLTLGGAMR 

i I I f I I I I 1 I 1 I 1 I I I I I I I I I I 1 i I i I I t I I I I I t I I I I I I I t I i 1 1 I I I I 1 II 

orf 133-1 STQ PTN FS DAS ES PNN ASKE DQLKQGYGLSRVS ALPR DYGRLEVGTRWLGNKLTLGGAMR 

690 700 710 720 730 740 

790 800 810 820 830 840 

orf 133ng-l . pep YFGKSIRATAEERYIDGTNGGNTSNVRQLGKRSIKQTETLARQPLIFDFYAAYEPKKNLI 
1 1 1 I 1 I 1 I 1 1 I I I ! 1 I 1 1 1 I 1 1 I 1 ! I 1 I 1 I I I i I I 1 1 1 I 1 1 1 1 I 1 1 t I I I I I I I t 1 1 1 1 
orf 133-1 Y FGKS IRATAEERY I DGTNGGNT SN FRQLGKRS I KQTET LARQPL I FD FYAAYEPKKNL I 

750 760 770 780 790 800 

850 860 870 880 890 900 

orf 133ng-l .pep FRAEVKNLFDRRYIDPLDAGNDAATQRYYSSFDPKDKDEDVTCNADKTLCNGKYGGTSKS 
1 1 1 I 1 I 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 I 1 M K 1 1 1 1 1 I M 1 1 1 1 I 1 1 I 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
orf 133-1 FRAEVKNLFDRRY I DPLDAGNDAATQRYYS S FDPKDKDE DVT CNADKT LCNGKYGGT SKS 

810 820 830 840 850 860 

910 920 
orfl33ng-l.pep VLTN FARGRT FLMTMS YKFX 
I I I II I I I I I I I I I I 1 I I I I 
orf 133-1 VLTN FARGRT FLMTMS YKFX 

870 880 

In addition, ORF133ng-l is homologous to a TonB-dependent receptor in H.influenzae: 



WO 99/24578 



-482- 



PCT/IB98/01665 



sp|P45114 |YC17_HAEIN PROBABLE TONB- DEPENDENT RECEPTOR HI1217 PRECURSOR 
>gi|1075372|pir| IG64110 transferrin binding protein 1 precursor (tbpl) homolog - 
Haemophilus influenzae (strain Rd KW20) >gi| 1574147 (U32801) transferrin binding 
protein 1 precursor (tbpl) (Haemophilus influenzae] Length = 913 
Score = 930 bits (2377) r Expect = 0.0 

Identities - 476/921 (51%), Positives = 619/921 (66%), Gaps = 72/921 (7%) 



38 QVLEDVHVKAKRVPKDKKVFTDARAVSTRQDVFKSGENLDNIVRSIPGAETQQDKSSGIV 97 

+ L + V K + DKK FT+A+A STR++VFK + +D ++RS I PGAFTQQDK SG+V 
29 ETLGQI DWEKVI SNDKKPFTEAKAKSTRENVFKETQTI DQVIRS I PGAFTQQDKGSGW 88 

98 SLNIRGDSGFGRVNTMVDGITQTFYSTSTDAGRAGGSSQFGASVDSNFIAGLDWKGSFS 157 

S+NIRG++G GRVNTMVDG+TQTFYST+ D+G++GGSSQFGA++D NFIAG+DV K +FS 
89 SVNIRGENGLGRVNTMVDGVTQTFYSTALDSGQSGGSSQFGAAIDPNFIAGVDVNKSNFS 148. 

158 GSAGINSLAGSANLRTLGVDDWQXXXXXXXXXXXXXXXXXXXXXAMAAIGARKWLESGA 217 

G++GIN+LAGSAN RTLGV+DV+ M RKWL++G 

14 9 GASGINALAGSANFRTLGVNDVITDDKPFGIILKGMTGSNATKSNFMTMAAGRKWLDNGG 208 

218 SVGVLYGHSRRGVAQNYRVGGGGQHIGNFGEEYLERRKQQYFVQEGGLKFNAGSGKWERD 277 

VGV+YG+S+R V+Q+YR+ GGG+ + + G++ L + K+ YF + G N G+W D 
209 YVGWYGYSQREVSQDYRI-GGGERLASLGQDILAKEKEAYF-RNAGYILNP-EGQWTPD 265 

278 LQRQYWK TKWY KKYEDPQELQK YIEE 303 

L +++W +Y KK +D ++LQK I EE 

2 66 LSKKHWSCNKPDYQKNGDCSYYRIGSAAKTRREILQELLTNGKKPKDIEKLQKGNDGIEE 325 

304 HDKSWRENLAPQYDITPIDPSGLKQQSAGNLFKLEYDGVFNKYTAQFRDLNTRIGSRKII 363 

DKS+ N QY + PI+P L+ +S +L K EY AQ R L+ +IGSRKI 

326 TDKSFERN-KDQYSVAPIEPGSLQSRSRSHLLKFEYGDDHQNLGAQLRTLDNKIGSRKIE 384 

364 NRNYQFNYGLSLNPYTNLNLTAAYNSGRQKYPKGAKFTGWGLLKDFETYNNAKILDLNNT 423 

NRNYQ NY + N Y +LNL AA+N G+ YPKG F GW + T N A I+D+NN+ 

385 NRNYQVNYNFNNNSYLDLNLMAAHNIGKTIYPKGGFFAGWQVADKLITKNVANIVDINNS 444 

424 ATFRLPRETELQTTLGFNYFHNEYGKNRFPEELGLFFDGPDQDNGLYSY — LGRFKGDKG 481 

TF LP+E +L+TTLGFNYF NEY KNRFPEEL LF++ D GLYS+ GR+ G K 
445 HTFLLPKEIDLKTTLGFNYFTNEYSKNRFPEELSLFYNDASHDQGLYSHSKRGRYSGTKS 504 

482 LLPQKSTIVQPAGSQYFNTFYFDAALKKDIYRLNYSTNAINYRFGGEYTGYYGSENEFKR 541 

LLPQ+S I+QP+G Q F T YFD AL K IY LNYS N +Y F GEY GY 
505 LLPQRSVILQPSGKQKFKTVYFDTALSKGIYHLNYSVNFTHYAFNGEYVGY 555 

542 AFGENSPAYKEHCDPSCGLYEPVLKKYGKKRANNHSVSISADFGDYFMPFAGYSRTHRMP 601 

EN+ + + EP+L K G K+A NHS ++SA+ DYFMPF YSRTHRMP 

556 ENTAGQQ — : INEP I LHKSGHKKAFNHS AT LSAELSDYFMPFFT YSRTHRMP 604 

602 NIQEMYFSQIGDSGVHTALKPERANTWQFGFNTYKKGLLKQDDILGLKLVGYRSRIDNYI 661 

NIQEM+FSQ+ ++GV+TALKPE+++T+Q GFNTYKKGL QDD+LG+KLVGYRS I NYI 
605 NIQEMFFSQVSNAGVNTALKPEQSDTYQLGFNTYKKGLFTQDDVLGVKLVGYRSFIKNYI 664 

662 HNVYGKWWDLNGDIPSWVGSTGLAYTIRHRNFKDKVHKHGFELELNYDYGRFFTNLSYAY 721 

HNVYG WW +P+W S G YTI H+N+K V K G ELE+NYD GRFF N+SYAY 

665 HNVYGVWW — RDGMPTWAESNGFKYTIAHQNYKPIVKKSGVELEINYDMGRFFANVSYAY 722 

722 QKSTQPTNFSDASESPNNASKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGA 781 
Q++ QPTN++DAS PNNAS+ED LKQGYGLSRVS LP+DYGRLE+GTRW KLTLG A 

723 QRTNQPTNYADASPRPNNASQEDILKQGYGLSRVSMLPKDYGRLELGTRWFDQKLTLGLA 782 

782 MR YFGKS IRAT AEERY I DGTNGGNTSN VRQLGKRS I KQTET LARQ PLI FDFYAA YE PBCKN 841 

RY+GKS RAT EE YI+G+ + +R+ ++K+TE + +QP+I D + +YEP K+ 

783 ARYYGKSKRATIEEEYINGSR-FKKNTLRRENYYAVKKTEDIKKQPIILDLHVSYEPIKD 841 

842 LIFRAEVKNLFDRRYIDPLDAGNDAATQRYYSSFDPKDKDEDVTCNADKTLCNGKYGGTS 901 

LI +AEV+NL D+RY+DPLDAGNDAA+QRYYSS + + C D + C GG+ 
842 LIIKAEVQNLLDKRYVDPLDAGNDAASQRYYSSL NNSIECAQDSSAC GGSD 892 

902 KS VLTN FARGRT FLMTMS YKF 922 

K+VL NFARGRT++++++YKF 
893 KTVLYN FARGRT Y I LSLN YKF 913 
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The underlined motif in the gonococcal protein (also present in the meningococcal protein) is 
predicted to be an ATP/GTP-binding site motif A (P-loop), and the analysis suggests that these 
proteins from ^meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for 
vaccines or diagnostics, or for raising antibodies. 

Example 104 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 885> 

1 ATGAACCTGA TTTCACGTTA CATCATCCGT CAAATGGCGG TTATGGCGQT 

51 TTACGCGCTC CTTGCCTTCC TCGCTTTGTA CAGCTTTTTT GAAATCCTGT 

101 ACGAAACCGG CAACCTCGGC AAAGGCAGTT ACGGCATATG GGAAATGCTG 

151 GGCTACACCG CCCTCAAAAT GCCCGCCCGC GCCTACGAAC TGATTCCCCT 

201 CGCCGTCCTT ATCGGCGGAC TGGTCTCCCT CAGCCAGCTT GCCGCCGGCA 

251 GCGAACTGAC CGTCATCAAA GCCAGCGGCA TGAGCACCAA AAAGCTGCTG 

301 TTGATTCTGT CGCAGTTCGG TTTTATTTTT GCTATTGCCA CCGTCGCGCT 

351 CGGCGAATGG GTTGCGCCCA CACTGAGCCA AAAAGCCGAA AACATCAAAG 

401 CCGCCGCCAT CAACGGCAAA ATCAGCACCG GCAATACCGG CCTTTGGCTG 

451 AAAGAAAAAA ACAGCGTGAT CAATGTGCGC GAAATGTTGC CCGACCAT. . 

This corresponds to the amino acid sequence <SEQ ID 886; ORF112>: 

1 MNLISRYIIR QMAVMAVYAL LAFLALYSFF EILYETGNLG KGSYGIWEML 

51 GYTALKMPAR AYE LIPLAVL IGGLVSLSQL AAGSELTVIK ASGMSTKKLL 

101 LILSQFGFIF AIATVA LGEW VAPTLSQKAE NIKAAAINGK ISTGNTGLWL | 

151 KEKNSVINVR EMLPDH. . . j 

Further work revealed further partal nucleotide sequence <SEQ ID 887>: 

1 ATGAACCTGA TTTCACGTTA CATCATCCGT CAAATGGCGG TTATGGCGGT 

51 TTACGCGCTC CTTGCCTTCC TCGCTTTGTA CAGCTTTTTT GAAATCCTGT 

101 ACGAAACCGG CAACCTCGGC AAAGGCAGTT ACGGCATATG GGAAATGCTG 

151 gGCTACACCG CCCTCAAAAT GCCCGCCCGC GCCTACGAAC TGATTCCCCT 

201 CGCCGTCCTT ATCGGCGGAC TGGTCTCCCT CAGCCAGCTT GCCGCCGGCA 

251 GCGAACTGAC CGTCATCAAA GCCAGCGGCA TGAGCACCAA AAAGCTGCTG 

301 TTGATTCTGT CGCAGTTCGG TTTTATTTTT GCTATTGCCA CCGTCGCGCT 

351 CGGCGAATGG GTTGCGCCCA CACTGAGCCA AAAAGCCGAA AACATCAAAG 

401 CCGCCGCCAT CAACGGCAAA ATCAGCACCG GCAATACCGG CCTTTGGCTG 

451 AAAGAAAAAA ACAGCrTkAT CAATGTGCGC GAAATGTTGC CCGACCATAC 

501 GCTTTTGGGC ATCAAAATTT GGGCGCGCAA CGATAAAAAC GAATTGGCAG 

551 AGGCAGTGGA AGCCGATTCC GCCGTTTTGA ACAGCGACGG CAGTTGGCAG 

601 TTGAAAAACA TCCGCCGCAG CACGCTTGGC GAAGACAAAG TCGAGGTCTC 

651 TATTGCGGCT GAAGAAAACT GGCCGATTTC CGTCAAACGC AACCTGATGG 

701 ACGTATTGCT CGTCAAACCC GACCAAATGT CCGTCGGCGA ACTGACCACC 

751 TACATCCGCC ACCTCCAAAA CAACAGCCAA AACACCCGAA TCTACGCCAT 

801 CGCATGGTGG CGCAAATTGG TTTACCCCGC CGCAGCCTGG GTGATGGCGC 

851 TCGTCGCCTT TGCCTTTACC CCGCAAACCA CCCGCCACGG CAATATGGGC 

901 TTAAAACTCT TCGGCGGCAT CTGTsTCGGA TTGCTGTTCC ACCTTGCCGG 

951 ACGGCTCTTT GGGTTTACCA GCCAACTCGG . , . 

This corresponds to the amino acid sequence <SEQ ID 888; ORF112-l>: 

1 MNLISRYIIR QMAVMAVYAL LAFLALYSFF EILYETGNLG KGSYGIWEML 

51 GYTALKMPAR A YE LIPLAVL IGGLVSLSQL AAGSELTVIK ASGMSTKKLL 

101 LILSQFGFlFAIATVA LGEW VAPTLSQKAE NIKAAAINGK ISTGNTGLWL 

15 1 KEKNSXINVR EMLPDHTLLG IKIWARNDKN ELAEAVEADS AVLNSDGSWQ 

201 LKNIRRSTLG EDKVEVSIAA EENWPISVKR NLMDVLLVKP DQMSVGELTT 

251 YIRHLQNNSQ NTRIYAIAWW R KLVYPAAAW VMALVAFAF T PQTTRHGNMG 

301 LKLFGGICXG LLFHLA GRLF GFTSQL. . . 

Computer analysis of this amino acid sequence predicts two transmembrane domains and gave the 
following results: 
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Homology with a predicted ORF from N.menineitidis (strain A) 

ORF1 12 shows 96.4% identity over a 166aa overlap with an ORF (ORF1 12a) from strain A ofN. 
meningitidis: 

10 20 30 40 50 60 

orf 112 . pep MNL I SRYI IRQMAVMAVYALLAFLALYS FFE I LYETGNLGKGSYG I WEMLGYTALKMP^ 
I 1 1 1 1 1 1 1 1 1 I I I I I I I 1 1 1 I 1 1 1 K I I K I I I I 1 1 1 1 K I I I I I 1 I I 1 I I I lllttll II 
orf 112a MNLISRYII RQMAVMAVYALLAFLAL Y S FFE IL YETGNLGKG S YG I WEMXGYT ALKMXAR 

10 20 30 40 50 60 

70 80 90 100 110 120 

or f 112 . pep AYELIPLAVLIGGLVSLSQLAAGSELTVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 
I |||:||||| III II I III 111111:11 I IMIM.IIII I MM II II I I IMIIMII 
o r f 1 1 2 a AYELMPLAVL IGGLVSXSQLAAGSELXVIKASGMSTKKLLLI LSQFGFI FAI ATVALGEW 

70 80 90 100 110 120 

130 140 150 160 

orf 112 . pep VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSVINVREMLPDH 

I M II M I I II 11 II I II M II It II II II I II II : I II II I II I I 
orf 112a VAPTLSQECAENIKAAAINGKISTGNTGLWLKEKNSIINVREMLPDHTLLGIKIWARNDKN 

130 140 150 160 170 180 

orfl!2a EIJ^VEADSAVLNSDGSWQLKNIRRSTLGEDKVEVSIAAEEXWPISVKRNLMDVLLVKP 
190 200 210 220 230 240 

The ORF1 12a nucleotide sequence <SEQ ID 889> is: 

1 ATGAACCTGA TTTCACGTTA CATCATCCGT CAAATGGCGG TTATGGCGGT 

51 TTACGCGCTC CTTGCCTTCC TCGCTTTGTA CAGCTTTTTT GAAATCCTGT 

101 ACGAAACCGG CAACCTCGGC AAAGGCAGTT ACGGCATATG GGAAATGNTG 

151 GGNTACACCG CCCTCAAAAT GNCCGCCCGC GCCTACGAAC TGATGCCCCT 

201 CGCCGTCCTT ATCGGCGGAC TGGTCTCTNT CAGCCAGCTT GCCGCCGGCA 

251 GCGAACTGAN CGTCATCAAA GCCAGCGGCA TGAGCACCAA AAAGCTGCTG 

301 TTGATTCTGT CGCAGTTCGG TTTTATTTTT GCTATTGCCA CCGTCGCGCT 

351 CGGCGAATGG GTTGCGCCCA CACTGAGCCA AAAAGCCGAA AACATCAAAG 

401 CCGCGGCCAT CAACGGCAAA ATCAGTACCG GCAATACCGG CCTTTGGCTG 

451 AAAGAAAAAA ACAGCATTAT CAATGTGCGC GAAATGTTGC CCGACCATAC 

501 CCTGCTGGGC ATTAAAATCT GGGCCCGCAA CGATAAAAAC GAACTGGCAG 

551 AGGCAGTGGA AGCCGATTCC GCCGTTTTGA ACAGCGACGG CAGTTGGCAG 

601 TTGAAAAACA TCCGCCGCAG CACGCTTGGC GAAGACAAAG TCGAGGTCTC 

651 TATTGCGGCT GAAGAAAANT GGCCGATTTC CGTCAAACGC AACCTGATGG 

701 ACGTATTGCT CGTCAAACCC GACCAAATGT CCGTCGGCGA ACTGACCACC 

751 TACATCCGCC ACCTCCAAAN NNACAGCCAA AACACCCGAA TCTACGCCAT 

801 CGCATGGTGG CGCAAATTGG TTTACCCCGC CGCAGCCTGG GTGATGGCGC 

851 TCGTCGCCTT TGCCTTTACC CCGCAAACCA CCCGCCACGG CAATATGGGC 

901 TTAAAANTCT TCGGCGGCAT CTGTCTCGGA TTGCTGTTCC ACCTTGCCGG 

951 NCGGCTCTTC NGGTTTACCA GCCAACTCTA CGGCATCCCG CCCTTCCTCG 

1001 NCGGCGCACT ACCTACCATA GCCTTCGCCT TGCTCGCCGT TTGGCTGATA 

1051 CGCAAACAGG AAAAACGCTA A 

This encodes a protein having the amino acid sequence <SEQ ID 890>: 

1 MNLISRYIIR QMAVMAVYAL LAFLALYSFF EILYETGNLG KGSYGIWEMX 

51 GYTALKMXAR A YE LMPLAVL IGGLVSXSQ L AAGSELXVIK ASGMSTKKLL 

101 LILSQFGFIF AIATVA LGEW VAPTLSQKAE NIKAAAINGK ISTGNTGLWL 

151 KEKNSIINVR EMLPDHTLLG IKIWARNDKN ELAEAVEADS AVLNSDGSWQ 

201 LKNIRRSTLG EDKVEVSIAA EEXWPISVKR NLMDVLLVKP DQMSVGELTT 

251 YIRHLQXXSQ NTRIYAIAWW R KLVYPAAAW VMALVAFAFT PQTTRHGNMG 

301 LKXFGGICLG LLFHLA GRLF XFTSQLYGIP PFLXGALPTI AFALLAVWLI 

351 RKQEKR* 

ORF1 12a and ORF1 12-1 show 96.3% identity in 326 aa overlap: 

orf 112a . pep MNLISRYI IRQMAVMAVYALLAFLALYS FFE ILYETGNLGKGSYGIWEMXGYT ALKMXAR 
M 1 1 t I H II I M 1 1 I I I 1 1 I M I i I I M I I M 1 I I I M M I I It I I M I I I 1 1 1 I II 
orfll2-l MNLI SRYI IRQMAVMAVYALLAFLALYS FFE I LYETGNLGKGS YGIWEMLGYTALKMPAR 



orf 112a . pep AYELMPLAVLIGGLVSXSQLAAGSELXVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 
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orfll2-l 
orf 112a. pep 
orfll2-l 
orf 112a. pep 
orfll2-l 
orf 112a. pep 
orfll2-l 
orf 112a. pep 
orfll2-l 



M ||:ltl(ll II II I I III II II I: II MM II MUM MINI Mill I M Ml I I 
AYELIPLAVLIGGLVSLSQIAAGSELTVI 

VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSIINVREMLPDHTLLGIKIWAKNDKN 
t , ■ i 1 1 1 1 1 I I 1 I It 1 | | I I t 11 1 1 1 1 i I I I I M I MMMMMMMMMMIMI 
VA^LSQKAEN IKAAAJNGKI STGNTGLWLKEKN SXINVREMLPDHTLLGIKIWARNDKN 

ELAEAVEADSAVLNSDGSWQLKNIRRSTLGE^ 

I I i i I I | 1 1 I I I M II II I M M M I II I I M I II I M II I 1 II I II 1 1 M It I M II I 

DQMSVGELTTY IRHLQXXS QOTRI YAI AWWRKLVYPAAAWVMALVAFAFT PQTTRHGNMG 
MIIMIIilMltll IMIIIIIMIMIMMIMMIIMIIIIIIIIIIIMM 
DQMWGELTTYIRHLQ^ 

LKXFGGICLGLLFHLAGRLFXFTSQLYGIPPFLXGALPTIAFALLAVWLIRKQEKRX 

II Mill MIMMMM Mill 
LKLFGGICXGLLFHLAGRLFGFTSQL 



Homology wi th a predicted ORF from ^gonorrhoeae 

ORF1 12 shows 95.8% identity over 166aa overlap with a predicted ORF (ORF1 12ng) from N. 



gonorrhoeae: 

orf 112. pep 
orfll2ng 
orf 112. pep 
orfll2ng 
orf 112. pep 
orfll2ng 



mnlisryiirqmavmavyallaflalysffeilyetgnlgkgsygiwemlgytalkmpar 60 
Tmmimimmimmmmmimmmimmmmimmmmmimimim 

MNL I S RYI IRQMAVMAVY ALLAFIALY S FFE I LYETGNLGKGS YG IWEMLG YT ALKMPAR 60 

AYELIPLAVLIGGLVSLSQLAAGSELTVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 120 
III I- II I Mill Mill II I MIMMMM MM I Mill 1 1 1 1 1 1 1 MM Ml 1 1 11 
ATOLMPLAVLIG^ 120 

VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSVINVREMLPDH 166 
I I I II I I 1 1 II II II II M M I M II M M M I M M II I Mill 
VAPTLSQKA^ 



180 



The complete length ORF1 12ng nucleotide sequence <SEQ TD 891> is: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 



ATGAACCTGA 
TTACGCGCTC 
ACGAAACCGG 
GGCTACACCG 
CGCCGTCCTC 
GCGAACTGGC 
TTGATTCTGT 
CGGCGAATGG 
cCGCCGCCAt 
AAAGAAAAAa 
GCTTTTGGGC 
AGGCAGTGGA 
TTGAAAAACA 
cgCCGCCGCC 
ACGTATTGCT 
TACATCCGCC 
CGCATGGTGG 
TCGTTGCCTT 
TTAAAACTCT 
CAGGCTCTTC 
CCGGCGCACT 
CGCAAACAGG 



TTTCACGTTA 

CTTGCCTTCC 

CAACCTCGGC 

CCCTCAAAAT 

ATCGGCGGAC 

CGTCATCAAA 

CTCAGTTCGG 

GTTGCGCCCA 

taacggCAAA 

ccAGCATTAT 

ATCAAAATTT 

AGCCGATTCC 

TCCGCCGCAG 

GAAGAAACTT 

CGTCAAGCCC 

ACCTCCAAAA 

CGTAAACTCG 

CGCCTTTACG 

TCGGCGGCAT 

GGGTTTACCA 

GCCTACCATA 

AAAAACGTTG 



CATCATCCGC 

TCGCTTTGTA 

AAAGGCAGTT 

GCCCGCCCGC 

TGGCCTCTCT 

GCCAGCGGCA 

TTTTATTTTT 

CGCTGAGCCA 

ATCAGCAccg 

CAATGTGcGc 

GGGCGCGCAA 

GCCGTTTTGA 

CATCATGGGT 

gGCCGATTGC 

GACCAAATGT 

CAACAGCCAA 

TTTACCCCGT 

CCGCAAACCA 

CTGTCTCGGA 

GCCAACTCTA 

GCCTTCGCCT 

A 



CAAATGGCGG 

CAGCTTTTTT 

ACGGCATATG 

GCCTACGAAC 

CAGCCAGCTT 

TGAGCACCAA 

GCTATTGCCG 

AAAAGCCGAA 

gcAATACCGG 

GGAATGTTGC 

CGATAAAAAC 

ACAGCGACGG 

ACAGACAAAA 

CGTCAGACGC 

CCGTCGGCGA 

AACACCCAAA 

CGCCGCATGG 

CGCGCCACGG 

TTGCTGTTCC 

CGGCACCCCA 

TGCTCGCTGT 



TTATGGCGGT 
GAAATCCTGT 
GGAAATGCTG 
TCATGCCCCT 
GCCGCCGGCA 
AAAGCTGCTG 
CCGTCGCGCT 
AACATCAAag 
CCTTTggcTG 
CCGACCATAC 
GAATTGGCAG 
CAGCTGGCAG 
TCGAAACATC 
AACCTGATGG 
GCTGACCACC 
TCTACGCCAT 
GTCATGGCGC 
CAATATGGGC 
ACCTTGCCGG 
CCCTTCCTCG 
TTGGCTGATA 



This encodes a protein having amino acid sequence <SEQ ID 892>: 

MNLISRYIIR QMAVMAVYAL LAFLALYSFF EILYETGNLG 



1 

51 
101 
151 
201 
251 
301 



KYTRT.KMPAR AYE LMPLAVL IGGLASLSQL AAGSELAVIK 

„ — - VAPTLSQKAE NIKAAAINGK 

IKIWARNDKN ELAEAVEADS 
EETWPIAVRR NLMDVLLVKP 
R KLVYPVAAW VMALVAFAFT 
GFTSQLYGTP PF LAGALPTI 



LILSQFGFIF AIAAVA LGEW 
KEKTSIINVR GMLPDHTLLG 
LKNIRRSIMG TDKIETSAAA 
YIRHLQNNSQ NTQIYAIAWW 
LKLFGGICLG LLFHLAGRLF 



KGSYGIWEML 
ASGMSTKKLL 
ISTGNTGLWL 
AVLNSDGSWQ 
DQMSVGELTT 
PQTTRHGNMG 
AFALLAVWLI 
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10 



15 



20 



25 



30 



35 



351 RKQEKR* 

ORF1 12ng and ORF1 12-1 show 94.2% identity in 326 aa overlap: 

10 20 30 40 50 60 

orf 1 12ng MNLISRYI IRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMLGYTALKMPAR 

I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 
orf 112-1 MNLISRYI IRQMAVMAVYALLAFLALYSFFEILYETGNIX3KGSYGIWEMLGYTALKMPAR 

10 20 30 40 50 60 

70 80 90 100 110 120 

orfll2ng AYELMPLAVLIGGLASLSQLAAGSELAVIKASGMSTKKLLLILSQFGFIFAIAAVALGEW 

II II: II I (MM 1:1 II I I tl III 1:1 M Ml II II il I I I IMI I II I llhllllll 
orf 112-1 AYE LI PLAVLIGGLVSLSQLAAGSELTVIKASGMSTKKLLLILSQFGFI FAIATVALGEW 

70 80 90 100 110 120 

130 140 150 160 170 180 

orfll2ng VAPTLSQKAEN IKAAAINGKISTGNTGLWLKEKTS I INVRGMLPDHTLLG IKIWARNDKN 

II II I IN II II Mil I It II ! Illl MM 111:1 Mil M I I I I I II M I I I I I I I I 
orf 112-1 VAPTLSQKAEN IKAAAINGKISTGNTGLWLKEKNSXINVREMLPDHTLLGIKI WARN DKN 

130 140 150 160 170 180 

190 200 210 220 230 240 

orfll2ng ELAEAVEADSAVLNSDGSWQLKNIRRSIMGTDKIETSAAAEETWPIAVRRNLMDVLLVKP 
II I I II I I I II I I II I I I I I I II I I I I : I 11:1:1 I I I I : I I I : I : I I I I I I I I I I I 
orf 112-1 ELAEAVEADSAVLNSDGSWQLKNIRRSTLGEDKVEVSIAAEENWPISVKRNLMDVLLVKP 

190 200 210 220 230 240 

250 260 270 280 290 300 

orfll2ng DQMS VGELTT Y IRHLQNN SQNTQI YAIAWWRKLVY PVAAWVMALVAFAFT PQTTRHGNMG 

I I I M I II I II I I I I I I M I M : I M M II I I I I M : I I I I I I I I I I II I I I I I I I I I 11 
O r f 1 1 2 - 1 DQMSVGELTT YIRHLQNNSQNTRI YAIAWWRKLVY PAAAWVMALVAFAFT PQTTRHGNMG 

250 260 270 280 290 300 

310 320 330 340 350 

orfll2ng LKLFGGICLGLLFHLAGRLFGFTSQLYGTPPFLAGALPTIAFALLAVWLIRKQEKRX 

I I II I I I I II I I I I I I I I I I I I I I I 
orf 112-1 LKLFGGICXGLLFHLAGRLFGFTSQL 

310 320 



This analysis suggests that these proteins from N. meningitidis and N.gonorrhoeae, and their 
40 epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



It will be appreciated that the invention has been described by means of example only, and that 
modifications may be made whilst remaining within the spirit and scope of the invention. 
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TABLE I - PCR primers 



ORF 



Primer 



Sequence 



Restriction sites 



ORF1 
ORF 2 
ORF 2-1 
ORF 4 
ORF5 

ORF 6 
ORF 7 
ORF 8 
ORF 9 
ORF 10 
ORF 11 
ORF 13 
ORF 15 

ORF 17 



Forward 
Reverse 

Forward 
Reverse 

Forward 
Reverse 

Forward 
Reverse 

Forward 
Forward 
Reverse 



Forward 
Reverse 

Forwarc 
Reverse 

Forward 
Reverse 

Forward 
Reverse 

Forward 
Reverse 



Forward 
Reverse 

Forward 
Reverse 

Forward 
Forward 
Reverse 

Forward 



CGC GGATCCGCTAGC - GGAC AC AC T TAT T T CGG 
CCCG CTCGAG - CCAGCGGTAGCCTAATT 

GC GGATCCCATATG -TTTGATTTCGGTTTGGG 
CCCGCTCGAG - GACGGCATAACGGCG 

GC GG AT CCC AT ATG - T TT GAT T TCGGTTT GGG 
CCCG CTCGAG -TGATTTACGGACGCGCA 

GCGGATCCCATATG- TGCGGAGGTCAAAAAGAC 
CCCGCTCGAG- TTTGGCTGCGCCTTC 

GGAATTCCATATGGCCATGG-TGGAAGGCGCACAACC 
CG GGATCC -ATGGAAGGCGCACAAC 
CCCG CTCGAG - GACTGTGCAAAAACGG 

CGCGGATCCCATATG-ACCCGTCAATCTCTGCA 
CCCG CTCGAG - TGCGCCGAACACTTTC 

CGCGGATCCGCTAGC-GCGCTGCTTTTTGTTCC 
CCCG CTCGAG - TTTCAAAATATATTTGCGGA 

GC GGATCCCATATG -GCTCAACTGCTTCGTAC 
CCCGCTCGAG- AGCAGGCTTTGGCGC 



CGCGGATCCCATATG-CCGAAGGAAGTCGGAAA 
CCCG CTCGAG -TTTCCGAGGTTTTCGGG 

GC GGATCCCATATG -GACACAAAAGAAATCCTC 
CCCG CTCGAG - TAATGGGAAACCTTGTTTT 

GC GGATCCCATATG -GCGGTCAACCTCTACG 
CCCGCTCGAG- GGAAACGACTTCGCC 

CGC GGATCCCATATG -GCTCTGCTTTCCGCGC 
CCCG CTCGAG - AGGGTGTGTGATAATAAG 

GGAATT CCATATGGCCATGG-GCGGGACACTGACAG 
CG GGATCC -TGCGGGACACTGACAGG 
CCCG CTCGAG - AGGTTGGCCTTGTCTATG 

GGAAT TCCATATGGCCATGG -TTGCCGGCCTGTTCG 



BamHI-Nhel 
Xhol 

BamHI-Ndel 
Xhol 

BamHI-Ndel 
Xhol 

BamHI-Ndel 
Xhol 

iNdel-Ncol 
iBamHI 
Xhol 

I 

BamHI-Ndel | 
Xhol 

BamHI-Nhel 
Xhol 

BamHI-Ndel 
Xhol 

BamHI-Ndel 
Xhol 



BamHI-Ndel 
Xhol 

BamHI-Ndel 
Xhol 

BamHI-Ndel 
Xhol 

Ndel-Ncol 

BamHI 

Xhol 

Ndel-Ncol 
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Forward 
Reverse 


CGGGATCC-ATTGCCGGCCTGTTCG 
CCCGCTCGAG-AAGCAGGTTGTACAGC 


BamHI 
Xhol 


ORF 18 


Forward 
Reverse 


GCGGATCCCATATG-ATTTTGCTGCATTTGGAT 
CCCGCTCGAG-TCTTCCAATTTCTGAAAGC 


BamHI-Ndel 
Xhol 


ORF 19 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG - TCGCCAGTGTTTT TACC 
CGGGATCC-TTCGCCAGTGTTTTTACCG 
CCCGCTCGAG- GGTGTTTTTGAAGCTGCC 


Ndel-Ncol 

BamHI 

Xhol 


ORF 20 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG -TCGGCGCGGGTATG 
CGGGATCC-TTCGGCGCGGGTATG 
CCCGCTCGAG- CGGCGAGCGAGAGCA 


Ndel-Ncol 

BamHI 

Xhol 


ORF 22 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG- TGATTAAAATCAAAAAAGGTCT 

CGGGATCC-ATGATTAAAATCAAAAAAGGTCTAAACC 

CCCGCTCGAG-ATTATGATAGCGGCCC 


Ndel-Ncol 

BamHI 

Xhol 


ORF 23 


Forward 
Reverse 


CGCGGATCCCATATG-GATGTTTCTGTTTCAGAC 
CCCGCTCGAG- TTTAAACCGATAGGTAAACG 


BamHI-Ndel 
Xhol 


ORF 24 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG -TGATGCCGGAAATGGTG 
CGGGATCC-ATGATGCCGGAAATGGTG 
CCCGCTCGAG -TGTCAGCGTGGCGCA 


Ndel-Ncol 

BamHI 

Xhol 


ORF 25 


Forward 
Reverse 


GCGGATCCCATATG-TATCGCAAACTGATTGC 
CCCGCTCGAG-ATCGATGGAATAGCCG 


BamHI-Ndel 
Xhol 


ORF 26 


Forward 
Reverse 


GCGGATCCCATATG - CAGCTGATCGACTATTC 
CCCGCTCGAG- GACATCGGCGCGTTTT 


BamHI-Ndel 
Xhol 


ORF 27 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG-AGACCTATTCTGTTTA 
CGGGATCC- CAGACCTATTCTGTTTATTTTAATC 
CCCGCTCGAG-GGGTTCGATTAAATAACCAT 


Ndel-Ncol 

BamHI 

Xhol 


ORF 28 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG- ACGGCTGTACGTTGATGT 

CGGGATCC-AACGGCTGTACGTTGATG 

CCCGCTCGAG-TTTGTCAGAGGAATTCGCG 


Ndel-Ncol 

BamHI 

Xhol 


ORF 29 


Forward 
Forward 
Reverse 


GCGGATCCCATATG -AACGGTTTGGATGCCCG 
CGCGGATCCGCTAGC-AACGGTTTGGATGCCCG 
CCCGCTCGAG-TTTGTCTAAGTTCCTGATATG 


BamHI-Ndel 
BamHI-Nhel 
Xhol 


ORF 32 


Forward 
Reverse 


CGCGGATCCCATATG-AATACTCCTCCTTTTG 
CCCGCTCGAG-GCGTATTTTTTGATGCTTTG 


BamHI-Ndel 
Xhol 


ORF 33 


Forward 
Reverse 


GCGGATCCCATATG -ATTGATAGGGATCGTATG 
CCCGCTCGAG-TTGATCTTTCAAACGGCC 


BamHI-Ndel 
Xhol 
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ORF35 


Forward c 
Forward c 
Reverse ( 


5cggatcccatatg-ttcagagctcagctt 
:gcggatccgctagc-ttcagagctcagctt 
xcgctcgag-aaacagccatttgagcga ] 


3amHI-NdeI 
3amHI-NneI | 
Khol 


ORF37 


Forward ( 
Reverse 


3CGG AT CC CAT AT G - GAT G ACGT ATCGGAT T TT 
XCGCTCGAG-ATAGCCCGCTTTCAGG ] 


rw TTT \TJaT 

3amHI-NdeI j 
Xhol | 


ORF58 


Forward 
Reverse 


CGCGGATCCGCTAGC-TCCGAACGCGAGTGGAT 
CCCGCTCGAG-AGCATTGTCQAAGGGGAC 


BamHI-Nhel 
Xhol 


ORF65 


Forward 

Forward 
Reverse 


GGAATT CCATATGGCCAT GG -TGCTGTATCTGAATCAAG 

CGGGATCC-TTGCTGTATCTGAATCAAGG 
CCCGCTCGAG-CCGCATCGGCAGACA 


Ndel-Ncol | 
BamHI j 
Xhol 


ORF66 


Forward 
Reverse 


GCGGATCCCATATG-TACGCATTTACCGCCG 
CCCGCTCGAG-TGGATTTTGCAGAGATGG 


BamHI-Ndel 
Xhol 


ORF72 


Forward 
Reverse 


CGCGGATCCCATATG- AATGCAGTAAAAATATCTGA 
CCCGCTCGAG-GCCTGAGACCTTTGCAA 


BamHI-Ndel 
Xhol j 


ORF73 


Forward 
Reverse 


GCGGATCCCATATG-AGATTTTTCGGTATCGG 
CCCGCTCGAG-TTCATCTTTTTCATGTTCG 


BamHI-Ndel ^ 
Xhol j 


ORF 75 


1 Forward 
Reverse 


GCGGATCCCATATG- TCTGTCTTTCAAACGGC 
CCCGCTCGAG-TTTGTTTTTGCAAGACAG 


BamHI-Ndel 
Xhol 


ORF76 


Forward 
Reverse 


GATCAGCTAGCCATATG-AAACAGAAAAAAACCGC 
CGGGATCC-TTACGGTTTGACACCGTT 


Nhel-Ndel j 
BamHI 


ORF 79 


Forward 
Reverse 


CGCGGATCCCATATG-GTTTCCGCCGCCG 
CCCGCTCGAG-GTGCTGATGCGCTTCG 


l-\ TTT \TJ-T 1 

BamHI-Ndel 
Xhol 


ORF 83 


Forward 
Reverse 


GCGGATCCCATATG-AAAACCCTGCTGCTGC 
CCCGCTCGAG-GCCGCCTTTGCGGC 


BamHI-Ndel 
Xhol 


ORF 84 


Forward 
Reverse 


GCGGATCCCAT AT G -GC AGAG ATCTGT TT G 
CCCGCTCG AG -GT TT GCCG ATCCGACCA 


T"» TTT \TJ_T 1 

BamHI-Ndel 
Xhol 


ORF 85 


Forward 
Reverse 


CGCGGATCCCATATG- GCGGTTTGGGGCGGA 
CCCGCTCGAG-TCGGCGCGGCGGGC 


BamHI-Ndel 
Xhol J 


ORF 89 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG-CCATACCTTCTTATCA 
CGGGATCC-GCCAT ACCTTCT TATCAGAG 
CCCGCTCGAG-TTTTTTGCGATTAGAAAAAGC 


Ndel-Ncol 

BamHI 

Xhol 


ORF 97 


Forward 


GCGGATCCCATATG-CATCCTGCCAGCGAAC 


BamHI-Ndel | 
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Reverse 


CCCGCTCGAG-TTCGCCTACGGTTTTTTG 


Xhol 


UKr yo 


Forward 
Reverse 


GCGGATCCCATATG-ACGGTAACTGCGG 
CCCGCTCGAG-TTGTTGTTCGGGCAAATC 


RamHT-NdeT 

L> <ll 1 II XL~1N UCi 

Xhol 


#ydt? inn 
UKr 1UU 


Forward 
Reverse 


GCGGATCCCATATG-TCGGGCATTTACACCG 
CCCGCTCGAG-ACGGGTTTCGGCGGAA 


RamHT-NdeT 
Xhol 


UKr 1U1 


rorwara 
Reverse 


GCGGATCCCATATG-ATTTATCAAAGAAACCTC 
CCCGCTCGAG-TTTTCCGCCTTTCAATGT 


RamHT-NdeT 

■Poll 1 Ql Uwl 

Xhol 


UKr lux 


Forward 
Reverse 


GCGGATCCCATATG-GCAGGGCTGTTTTACC 
CCCGCTCGAG-AAACGGTTTGAACACGAC 


J3 diniii -in uci 

Xhol 


An i? 1 Al 


Forward 
Reverse 


GCGGATCCCAT ATG -AACCACG ACATCAC 
CCCGCTCGAG-CAGCCACAGGACGGC 


o am ni-rN uci 
Xhol 


UKr 1U4 


Forward 
Reverse 


GCGG ATCCC AT ATG -ACGTGGGGAACGC 
CCCGCTCGAG-GCGGCGTTTGAACGGC 


jd aiiirii-iN uci 
Xhol 


/-\T> 17 1 AC 

UKr 1U5 


Forward 
Reverse 


GCGGATCCCAT ATG -ACCAAATTTCAAACCCCTC 
CCCGCTCGAG-TAAACGAATGCCGTCCAG 


13 rnlLTll-rN UCJL 

Xhol 


UKr lUo 


Forward 
Reverse 


GCGGATCCCAT ATG -AGGATAACCGACGGCG 
CCCGCTCGAG-TTTGTTCCCGATGATGTT 


OoIIlXli-lN UCI 

Xhol 


ORF109 


Forward 
Reverse 


G C GG ATC CC AT AT G - G AAG AT 1IA1A1A1 AA 1 AL, 1 U G 
CCCGCTCGAG-ATCAGCTTCGAACCGAAG 


T» TTT VTJ^T 

BamHI-Ndel 
Xhol 


ORF110 


Forward 
Reverse 


AAAGAATTC-ATGAGTAAATCCCGTAGATCTCCC 
AAACTGCAG-GGAAAACCACATCCGCACTCTGCC 


EcoRI 
PstI 


ORF111 


Forward 
Reverse 


AAAGAATTC-GCACCGCAAAAGGCAAAAACCGCA 
AAACTGCAG-TCTGCGCGT TTTCGGGCAGGGTGG 


EcoRI 
PstI 


ORF113 


Forward 
Reverse 


AAAGAATTC-ATGAACAAAACCCTCTATCGTGTGATTTTCAACCG 
AAACTGCAG-TTACGAATGCCTGCTTGCTCGACCGTACTG 


EcoRI 
PstI 


ORF115 


Forward 
Reverse 


AAAGAATTC-TTGCTTGTGCAAACAGAAAAAGACGG 
AAAAAAGTCGAC-CTATTTTTTAGGGGC HTTGC TTGTTTGAAAAGCCTGCC 


EcoRI 
Sail 


ORF119 


Forward 
Reverse 


AAAGAATTC - T ACAACATGTATCAGGAAAACCAATACCG 
AAACTGCAG-TTATGAAAACAGGCGCAGGGCGGTTTTGCC 


EcoRI 
PstI 


ORF120 


Forward 
Reverse 


AAAGAATTC-GCAAGGCTACCCCAATCCGCCGTG 
AAACTGCAG-CGGTTTGGCTGCCTGGCCGTTGAT 


EcoRI 
PstI 


ORF121 


Forward 
Reverse 


AAAGAATTC-GCCTTGGTCTGGCTGGTTTTCGC 
AAACTGCAG-TCATCCGCCACCCCACCTCGGCCATCCATC 


EcoRI 
PstI 
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ORF122 


Forward 
Reverse 


AAAAAAGTCGAC-ATGTC TTACCGCGCAAGCAGTTCTCC 
AAACTGCAG-TCAGGAACACAAACGATGACGAATATCCGTATC 


Sail 
PstI 


ORF125 


Forward 
Reverse 


AAAGAATTC-GCGCTGTTTTTTGCGGCGGCGTAT 
AAACTGCAG-CGCCGTTTCAAGACGAAAAAGTCG 


EcoRI 
PstI 


ORF126 


Forward 
Reverse 


AAACTGCAG-TTAATCTTGTCTTCCGATATAC 


bcoKi 
PstI 


ORF127 


Forward 
Reverse 


AAAAAAGTCGAC-CTTAAGTAACTTGCAGTCCTTATC 


EcoRI 
Sail 


ORF128 


Forward 
Reverse 


7v7\.AP7\7\ , P r Pp_7\ r T , pp2\7iPPT , p , pppp pt ncur' c pp 

AAACTGCAG- CTA 3TGCAATGCGCCGCC GCGGGAATG nTGAGCAGGCG 


bcoRl 
PstI 


ORF129 


Forward 
Reverse 


AAAbAATTU - AibbAI i 1 lObl 1 i ibALAl 1A1 J. iAv^bAAlALA^b 

AAACTGCAG-TTATTTTTTGATGAAATTTTGGGGCGG 


EcoRI 
PstI 

1 


ORF130 


Forward 
Reverse 


AAAGAATTC-GCAGTACTTGCCAT 2X^TCCjvjT(jv^Cj 
AAACTGCAG-CTCCGGATCGTCTGTAAACGCATT 


T7— .*T1T 

EcoRI 
PstI 

1 


ORF 131 


Forward 
Reverse 


PPP P 71 TPPP 7A T 71 TP —P 7A 71 7A TTP^^P 7A 7A T A 7A 7A A T 

CCCGCTCGAG-CCAGCGGACGCGTTC 


BamJil-JNael 
Xhol 


ORF 132 


Forward 
Reverse 


PP P P 7A T PP P 7A T 71 T P — 7A 7A 71 P 7A 7V p p p p rp T *P P 

CCCGCTCGAG-CCAATCTGCCAGCCGT 


Bamill-Nael 
Xhol 


ORF 133 


Forward 
Reverse 


PPPPPBTPPPBTnTP-PD 7AP*7A r PP , P'7Ar*P'PP , r^P' , C 

LuLbbn 1 CUv^A 1 A 1 b—VaAAvoAl bUAbobLbLvj 

CCCGCTCGAG-AAACTTGTAGCTCATCGT 


TTT \TJ.T 

BamHI-Ndel 
Xhol 


ORF 134 


Forward 
Reverse 


P.PnPATPPPATATP.-TPTPTP/^AAP4P , Af4TA r PTfi 

CCCGCTCGAG-ATCCTGTGCCAATGCG 


tfanuil-lNael 
Xhol 


ORF 135 


Forward 
Reverse 


PPPPBTPPPATBTP-PPPTPTPa 7A 7A 7\ STPTTT 
bv^bbAlCULAl AI b~VrfUblCIbAAAAAbLl 1 1 

CCCGCTCGAG-AAATACCGCTGAGGATG 


T> ^ TTT XTJ—T 

BamHI-Ndel 
Xhol 


ORF 136 


T"» J 

Forward 
Reverse 


PPPPP 2*. TPPPPT71PP— ATP 71 APPfZP.PP.T AT AP.PP 
VrfbL/<jo/iX v^v^vjC 1 nV3L n 1 vannoVybbL u ihlrtoUVy 

CCCGCTCGAG-TTCCGAATATTTGGAACTTTT 


T> — TTT "X.TU 

Bamril-lNnel 
Xhol 


ORF 137 


Forward 
Reverse 


CCZ PP4P4 A T CCC A T A T P4 — f4f4n A r fl H PP.fi Pi AAA T A 

CCCGCTCGAG-ATAACGGTATGCCGCC 


Bamiii-JNdei 
Xhol 


ORF 138 


Forward 
Reverse 


GCGGATCCCATATG-TTTCGTTTACAATTCAGGC 
CCCGCTCGAG-CGGCGTTTTATAGCGG 


BamHI-Ndel 
Xhol 


ORF 139 


Forward 
Reverse 


GCGGATCCCATATG-GCTTTTTTGGCGGTAATG 
CCCGCTCGAG-TAACGTTTCCGTGCGTTT 


BamHI-Ndel 
Xhol 



WO 99/24578 



-492- 



PCT/IB98/01665 



ORF 140 


Forward 
Reverse 


bbbbAl bUbAiAl b— I i GCCCACAGGCAGC 
CCCGCTCGAG-GACGATGGCAAACAGC 


BamHI-Ndel 
Xhol 


ORF 141 


Forward 
Reverse 


bbbbAl vA,UAIAl b-kA*b I L i bAAGCAGTC ± 
CCCGCTCGAG-ATCTGTTGTTTTTAAAATATT 


BamHI-Ndel 
Xhol 


ORF 142 


Forward 
Reverse 


bbbbAl bbbAl AI b-bAl AAT rCTGGTAGTGAAG 
CCCGCTCGAG-AAACGTATAGCCTACCT 


BamHI-Ndel 
Xhol 


ORF 143 


Forward 
Reverse 


GCGGATCCCATATG-GATACCGCTTTGAACCT 
CCCGCTCGAG-AATGGCTTCCGCAATATG 


BamHI-Ndel 
Xhol 


ORF 144 


Forward 
Reverse 


GCGGATCCCATATG-ACCTTTTTACAACGTTTGC 
CCCGCTCGAG-AGATTGTTGTTGTTTTTTCG 


BamHI-Ndel 
Xhol 


ORF 147 


Forward 
Reverse 


GCGGATCCCATATG-TCTGTCTTTCAAACGGC 
CCCGCTCGAG-TTTGTTTTTGCAAGACAG 


BamHI-Ndel 
Xhol 



NB: 

- restriction sites are underlined 



- for ORFs 1 10-130, where the ORF itself carries an EcoRI site (eg. ORF122), a Sail site 
was used in the forward primer instead. Similarly, where the ORF carries a Pstl site (eg. 
ORFs 115 and 127), a Sail site was used in the reverse primer. 
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TABLE II - Summary of cloning, expression and purification 



ORF 


PCR/cloning 


His-fusion 
expression 


GST-fusion 
expression 


Purification 


orfl 


+ 


+ 


+ 


His-fusion 


orf2 | 


+ 


+ 


+ 


GST-fusion 


orf2.1 


+ 


n.d. 


+ 


GST-fusion 


orf4 


+ 


+ 


+ 


His-fusion 


orf5 


+ 


n.d. 


+ 


GST-fusion 


orf6 


+ 


+ 


+ 


GST-fusion 


orf7 


+ 


+ 


+ 


GST-fusion 


orf8 


+ 


n.d. 


n.d. 




orf9 


+ 


+ 


+ 


GST-fusion 


orflO 


+ 


n.d. 


n.d. 




orfll 


+ 


n.d. 


n.d. 


i 


orfl3 


+ 


n.d. 


+ 


GST-fusion 


orflS 


+ 


+ 


+ 


GST-fusion 


orfl7 


+ 


n.d. 


n.d. 




orfl8 


+ 


n.d. 


n.d. 




orfl9 


+ 


n.d. 


n.d. 




orf20 


+ | 


n.d. 


n.d. 




orf22 


+ 


+ 


+ 


GST-fusion 


orf23 


+ 


+ 


+ 


His-fusion 


orf24 


+ 


n.d. 


n.d. 




orf25 


+ 


+ 


+ 


His-fusion 


orf26 


+ 


n.d. 


n.d. 




orf27 


+ 


+ 


+ 


GST-fusion 


orf28 




+ 


+ 


GST-fusion 


orf29 


+ 


n.d. 


n.d. 




orf32 


+ 


+ 


+ 


His-fusion 


orf33 


+ 


n.d. 


n.d. 




orf35 


+ 


n.d. 


n.d. 




orf37 


+ 


+ 


+ 


GST-fusion 


orf58 


+ 


n.d. 


n.d. 




orf65 


+ 


n.d. 


n.d. 




orf66 


+ 


n.d. 


n.d. 




orf72 


+ 


+ 


n.d. 


His-fusion 


orf73 


+ 


n.d. 


+ 


n.d. 


orf75 


+ 


n.d. 


n.d. 




orf76 




+ 


n.d. 


His-fusion 


orf79 


+ 


+ 


n.d. 


His-fusion 


orf83 


+ 


n.d. 


+ 


n.d. 


orf84 


+ 


n.d. 


n.d. 
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orf 85 


+ 


n.d. 


+ 


GST-fusion 


orf89 


+ 


n.d. 


+ 


GST-fusion 


orf 97 


+ 


+ 


+ 


GST-fusion 


orf 98 


+ 


n.d. 


n.d. 




orf 100 


+ 


n.d. 


n.d. 




orf 101 




n.d. 


n.d. 




orf 102 


+ 


n.d. 


n.d. 




orf 103 


+ 


n.d. 


n.d. 




orf 104 


+ 


n.d. 


n.d. 




orf 105 


+ 


n.d. 


n.d. 




orf 106 


+ 


+ 


+ 


His-fusion 


orf 109 


+ 


n.d. 


n.d. 




orf 110 


+ 


n.d. 


n.d. 




1 orflll 


+ 


+ 


n.d. 


His-fusion 


orf 113 


+ 


+ 


n.d. 


His-fusion 


orf 115 


n.d. 


n.d. 


n.d. 




orf 119 


+ 


+ 


n.d. 


His-fusion 


orf 120 


+ 


+ 


n.d. 


His-fusion 


orf 121 


+ 


n.d. 


n.d. 




orf 122 


+ 


+ 


n.d. 


His-fusion 


orf 125 


+ 


+ 


n.d. 


His-fusion 


orf 126 


+ 


+ 


n.d 


His-fusion 


orf 127 


+ 


+ 


n.d. 


His-fusion 


orf 128 


+ 


n.d. 


n.d. 




orf 129 


+ 


+ 


n.d. 


His-fusion 


orf 130 


+ 


n.d. 


n.d. 




orf 131 


+ 


+ 


+ 


n.d. 


orf 132 


+ 


+ 


+ 


His-fusion 


orf 133 


+ 


n.d. 


+ 


GST-fusion 


orf 134 


+ 


n.d 


n.d. 




orf 135 


+ 


n.d. 


n.d. 




orf 136 


+ 


n.d. 


n.d. 




orf 137 


+ 


n.d. 


+ 


GST-fusion \ 


orf 138 


+ 


n.d. 


+ 


GST-fusion 


orf 139 


+ 


n.d. 


ad. 




orf 140 




n.d. 


n.d. 




orf 141 


+ 


n.d. 


n.d. 




orfl42 


+ 


n.d. 


n.d. 




orf 143 


+ 


n.d. 


n.d. 




orf 144 


+ 


n.d. 


+ 


n.d. 


orf 147 


+ 


n.d. 


n.d. 
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CLAIMS 

1 . A protein comprising an amino acid sequence selected from the group consisting of SEQ 
IDs 2, 4, 6, and 8. 

2. A nucleic acid molecule which encodes a protein according to claim 1 . 

3. A nucleic acid molecule according to claim 2, comprising a nucleotide sequence selected 
from the group consisting of SEQ IDs 1, 3, 5, and 7. 

4. A protein comprising an amino acid sequence selected from the group consisting of SEQ 
IDs 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 
54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102, 
104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126, 128, 130, 132, 134, 136, 138, 140, 142, 
144, 146, 148, 150, 152, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 182, 
184, 186, 188, 190, 192, 194, 196, 198, 200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 
224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 256, 258, 260, 262, 
264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302j 
304, 306, 308, 310, 312, 314, 316, 318, 320, 322, 324, 326, 328, 330, 332, 334, 336, 338, 340, 342, 
344, 346, 348, 350, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 376, 378, 380, 382, 
384, 386, 388, 390, 392, 394, 396, 398, 400, 402, 404, 406, 408, 410, 412, 414, 416, 418, 420, 422, 
424, 426, 428, 430, 432, 434, 436, 438, 440, 442, 444, 446, 448, 450, 452, 454, 456, 458, 460, 462, 
464, 466, 468, 470, 472, 474, 476, 478, 480, 482, 484, 486, 488, 490, 492, 494, 496, 498, 500, 502, 
504, 506, 508, 510, 512, 514, 516, 518, 520, 522, 524, 526, 528, 530, 532, 534, 536, 538, 540, 542, 
544, 546, 548, 550, 552, 554, 556, 558, 560, 562, 564, 566, 568, 570, 572, 574, 576, 578, 580, 582, 
584, 586, 588, 590, 592, 594, 596, 598, 600, 602, 604, 606, 608, 610, 612, 614, 616, 618, 620, 622, 
624, 626, 628, 630, 632, 634, 636, 638, 640, 642, 644, 646, 648, 650, 652, 654, 656, 658, 660, 662, 
664, 666, 668, 670, 672, 674, 676, 678, 680, 682, 684, 686, 688, 690, 692, 694, 696, 698, 700, 702, 
704, 706, 708, 710, 712, 714, 716, 718, 720, 722, 724, 726, 728, 730, 732, 734, 736, 738, 740, 742, 
744, 746, 748, 750, 752, 754, 756, 758, 760, 762, 764, 766, 768, 770, 772, 774, 776, 778, 780, 782, 
784, 786, 788, 790, 792, 794, 796, 798, 800, 802, 804, 806, 808, 810, 812, 814, 816, 818, 820, 822, 
824, 826, 828, 830, 832, 834, 836, 838, 840, 842, 844, 846, 848, 850, 852, 854, 856, 858, 860, 862, 
864, 866, 868, 870, 872, 874, 876, 878, 880, 882, 884, 886, 888, 890, & 892.. 



5. A protein having 50% or greater sequence identity to a protein according to claim 4. 
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6. A protein comprising a fragment of an amino acid sequence selected from the group 
consisting of SEQ IDs 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 
44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 
96,98, 100, 102, 104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126, 128, 130, 132, 134, 

5 136, 138, 140, 142, 144, 146, 148, 150, 152, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 
176, 178, 180, 182, 184, 186, 188, 190, 192, 194, 196, 198, 200, 202, 204, 206, 208, 210, 212, 214, 
216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 
256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 
296, 298, 300, 302, 304, 306, 308, 310, 312, 314, 316, 318, 320, 322, 324, 326, 328, 330, 332, 334, 

10 336, 338, 340, 342, 344, 346, 348, 350, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 
376, 378, 380, 382, 384, 386, 388, 390, 392, 394, 396, 398, 400, 402, 404, 406, 408, 410, 412, 414, 
416, 418, 420, 422, 424, 426, 428, 430, 432, 434, 436, 438, 440, 442, 444, 446, 448, 450, 452, 454, 
456, 458, 460, 462, 464, 466, 468, 470, 472, 474, 476, 478, 480, 482, 484, 486, 488, 490, 492, 494, 
496, 498, 500, 502, 504, 506, 508, 510, 512, 514, 516, 518, 520, 522, 524, 526, 528, 530, 532, 534, 

1 5 536, 538, 540, 542, 544, 546, 548, 550, 552, 554, 556, 558, 560, 562, 564, 566, 568, 570, 572, 574, 
576, 578, 580, 582, 584, 586, 588, 590, 592, 594, 596, 598, 600, 602, 604, 606, 608, 610, 612, 614, 
616, 61 8, 620, 622, 624, 626, 628, 630, 632, 634, 636, 638, 640, 642, 644, 646, 648, 650, 652, 654, 
656, 658, 660, 662, 664, 666, 668, 670, 672, 674, 676, 678, 680, 682, 684, 686, 688, 690, 692, 694, 
696, 698, 700, 702, 704, 706, 708, 710, 712, 714, 716, 718, 720, 722, 724, 726, 728, 730, 732, 734, 

20 736, 738, 740, 742, 744, 746, 748, 750, 752, 754, 756, 758, 760, 762, 764, 766, 768, 770, 772, 774, 
776, 778, 780, 782, 784, 786, 788, 790, 792, 794, 796, 798, 800, 802, 804, 806, 808, 810, 812, 814, 
816, 818, 820, 822, 824, 826, 828, 830, 832, 834, 836, 838, 840, 842, 844, 846, 848, 850, 852, 854, 
856, 858, 860, 862, 864, 866, 868, 870, 872, 874, 876, 878, 880, 882, 884, 886, 888, 890, & 892.. 

7. An antibody which binds to a protein according to any one of claims 4 to 6. 

25 8. A nucleic acid molecule which encodes a protein according to any one of claims 4 to 6. 

9. A nucleic acid molecule according to claim 8, comprising a nucleotide sequence selected 
from the group consisting of SEQ IDs 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 
37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 
89,91,93,95,97,99, 101, 103, 105, 107, 109, 111, 113, 115, 117, 119, 121, 123, 125, 127, 129, 
30 131, 133, 135, 137, 139, 141, 143, 145, 147, 149, 151, 153, 155, 157, 159, 161, 163, 165, 167, 169, 
171, 173, 175, 177, 179, 181, 183, 185, 187, 189, 191, 193, 195, 197, 199, 201, 203, 205, 207, 209, 
211, 213, 215, 217, 219, 221, 223, 225, 227, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 
251, 253, 255, 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 283, 285, 287, 289, 
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291, 293, 295, 297, 299, 301, 303, 305, 307, 309, 31 1, 313, 315, 317, 319, 321, 323, 325, 327, 329, 
331, 333, 335, 337, 339, 341, 343, 345, 347, 349, 351, 353, 355, 357, 359, 361, 363, 365, 367, 369, 
371, 373, 375, 377, 379, 381, 383, 385, 387, 389, 391, 393, 395, 397, 399, 401, 403, 405, 407, 409, 
41 1, 413, 415, 417, 419, 421, 423, 425, 427, 429, 431, 433, 435, 437, 439, 441, 443, 445, 447, 449, 
5 45 1, 453, 455, 457, 459, 461, 463, 465, 467, 469, 471, 473, 475, 477, 479, 481, 483, 485, 487, 489, 
491, 493, 495, 497, 499, 501, 503, 505, 507, 509, 511, 513, 515, 517, 519, 521, 523, 525, 527, 529, 
531, 533, 535, 537, 539, 541, 543, 545, 547, 549, 551, 553, 555, 557, 559, 561, 563, 565, 567, 569, 
571, 573, 575, 577, 579, 581, 583, 585, 587, 589, 591, 593, 595, 597, 599, 601, 603, 605, 607, 609, 
61 1, 613, 615, 617, 619, 621, 623, 625, 627, 629, 631, 633, 635, 637, 639, 641, 643, 645, 647, 649, 
10 651, 653, 655, 657, 659, 661, 663, 665, 667, 669, 671, 673, 675, 677, 679, 681, 683, 685, 687, 689, 
691, 693, 695, 697, 699, 701, 703, 705, 707, 709, 711, 713, 715, 717, 719, 721, 723, 725, 727, 729, 
731, 733, 735, 737, 739, 741, 743, 745, 747, 749, 751, 753, 755, 757, 759, 761, 763, 765, 767, 769, 
771, 773, 775, 777, 779, 781, 783, 785, 787, 789, 791, 793, 795, 797, 799, 801, 803^ 805, 807, 809, 
811, 813, 815, 817, 819, 821, 823, 825, 827, 829, 831, 833, 835, 837, 839, 841, 843, 845, 847, 849, 
15 851, 853, 855, 857, 859, 861, 863, 865, 867, 869, 871, 873, 875, 877, 879, 881, 883, 885, 887, 889, 
&891.. j 

10. A nucleic acid molecule comprising a fragment of a nucleotide sequence selected from the 
group consisting of SEQ IDs 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 
41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 
20 93, 95, 97, 99, 101, 103, 105, 107, 109, 111, 113, 115, 117, 119, 121, 123, 125, 127, 129, 131, 133, 
135, 137, 139, 141, 143, 145, 147, 149, 151, 153, 155, 157, 159, 161, 163, 165, 167, 169, 171, 173, 
175, 177, 179, 181, 183, 185, 187, 189, 191, 193, 195, 197, 199, 201, 203, 205, 207, 209, 211, 213, 
215, 217, 219, 221, 223, 225, 227, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 251, 253, 
255, 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 283, 285, 287, 289, 291, 293, 

25 295, 297, 299, 301, 303, 305, 307, 309, 311, 313, 315, 317, 319, 321, 323, 325, 327, 329, 331, 333, 
335, 337, 339, 341, 343, 345, 347, 349, 351, 353, 355, 357, 359, 361, 363, 365, 367, 369, 371, 373, 
375, 377, 379, 381, 383, 385, 387, 389, 391, 393, 395, 397, 399, 401, 403, 405, 407, 409, 411, 413, 
415, 417, 419, 421, 423, 425, 427, 429, 431, 433, 435, 437, 439, 441, 443, 445, 447, 449, 451, 453, 
455, 457, 459, 461, 463, 465, 467, 469, 471, 473, 475, 477, 479, 481, 483, 485, 487, 489, 491, 493, 

30 495, 497, 499, 501, 503, 505, 507, 509, 511, 513, 515, 517, 519, 521, 523, 525, 527, 529, 531, 533, 
535, 537, 539, 541, 543, 545, 547, 549, 551, 553, 555, 557, 559, 561, 563, 565, 567, 569, 571, 573, 
575, 577, 579, 581, 583, 585, 587, 589, 591, 593, 595, 597, 599, 601, 603, 605, 607, 609, 611, 613, 
615, 617, 619, 621, 623, 625, 627, 629, 631, 633, 635, 637, 639, 641, 643, 645, 647, 649, 651, 653, 
655, 657, 659, 661, 663, 665, 667, 669, 671, 673, 675, 677, 679, 681, 683, 685, 687, 689, 691, 693, 

35 695, 697, 699, 701, 703, 705, 707, 709, 711, 713, 715, 717, 719, 721, 723, 725, 727, 729, 731, 733, 
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735, 737, 739, 741, 743, 745, 747, 749, 751, 753, 755, 757, 759, 761, 763, 765, 767, 769, 771, 773, 
775, 777, 779, 781, 783, 785, 787, 789, 791, 793, 795, 797, 799, 801, 803, 805, 807, 809, 811, 813, 
815, 817, 819, 821, 823, 825, 827, 829, 831, 833, 835, 837, 839, 841, 843, 845, 847, 849, 851, 853, 
855, 857, 859, 861, 863, 865, 867, 869, 871, 873, 875, 877, 879, 881, 883, 885, 887, 889, & 891.. 

5 11. A nucleic acid molecule comprising a nucleotide sequence complementary to a nucleic acid 
molecule according to any one of claims 8 to 10. 

12. A nucleic acid molecule comprising a nucleotide sequences having 50% or greater sequence 
identity to a nucleic acid molecule according to any one of claims 8-11. 

13. A nucleic acid molecule which can hybridise to a nucleic acid molecule according to any 
1 0 one of claims 8-12 under high stringency conditions. 

14. A composition comprising a protein, a nucleic acid molecule, or an antibody according to 
any preceding claim. 

15. A composition according to claim 14 being a vaccine composition or a diagnostic 
composition. 

15 16. A composition according to claim 1 4 or claim 1 5 for use as a pharmaceutical. 

1 7. The use of a composition according to claim 1 4 in the manufacture of a medicament for the 
treatment or prevention of infection due to Neisserial bacteria. 
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